mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Ask about the error message and the problem of networking: Reason: Couldn't connect to server #188

Open l1y1y opened 1 month ago

l1y1y commented 1 month ago

Hi, olivertam I encountered the following problem when using it, please help me how to solve this problem. I use it on a supercomputer cluster.

# GTF file =Mus_musculus.GRCm39.112.gtf 
# TE file =GRCm39_GENCODE_rmsk_TE.gtf 
# multi-mapper mode = multi 
# stranded = no
# differential analysis using DESeq2
# normalization = DESeq2_default
# FDR cutoff = 5.00e-02
# fold-change cutoff =  1.00
# read count cutoff = 1
# number of iteration = 100
# Alignments grouped by read ID = True

INFO  @ Fri, 24 May 2024 16:25:14: Processing GTF files ...

INFO  @ Fri, 24 May 2024 16:25:14: Building gene index ....... 

100000 GTF lines processed.
200000 GTF lines processed.
300000 GTF lines processed.
400000 GTF lines processed.
500000 GTF lines processed.
600000 GTF lines processed.
700000 GTF lines processed.
800000 GTF lines processed.
INFO  @ Fri, 24 May 2024 16:47:32: Done building gene index ...... 

INFO  @ Fri, 24 May 2024 16:49:51: Building TE index ....... 

INFO  @ Fri, 24 May 2024 16:54:41: Done building TE index ...... 

INFO  @ Fri, 24 May 2024 16:54:41: 
Reading sample files ... 

slurmstepd: error: acct_gather_profile/influxdb _send_data: curl_easy_perform failed to send data (discarded). Reason: Couldn't connect to server.

Finally, I also want to know why the ensemble in the finally generated cntTable file has some one-to-one correspondence with org.Mm.eg.db. Very much looking forward to your reply, thank you

olivertam commented 1 month ago

Hi,

It looks like the error message is related to your server, so might be outside our software's control. Are you using a Docker/Singularity container? Or installing it from source?

Could you explain your second question in more detail? I'm not sure what you mean.

Also, could you confirm that you're mapping against GENCODE's genome FASTA, and not Ensembl's? If you're using Ensembl's genome FASTA, you would need the GRCm39_Ensembl_rmsk_TE.gtf instead.

Thanks.

l1y1y commented 1 month ago

Hi,

It looks like the error message is related to your server, so might be outside our software's control. Could you explain your second question in more detail? I'm not sure what you mean. Also, could you confirm that you're mapping against GENCODE's genome FASTA, and not Ensembl's? If you're using Ensembl's genome FASTA, you would need the GRCm39_Ensembl_rmsk_TE.gtf instead.

Thanks.

thank you When the software is running, there is no need to connect to the Internet, right? Yes, I should use GRCm39_Ensembl_rmsk_TE.gt. The second problem is the ensemble numbers in the cntTable file I finally generated. When I used org.Mm.eg.db to convert geneID, some of them did not correspond and could not be converted. Maybe I used the wrong reference file.

l1y1y commented 1 month ago

Hi,

It looks like the error message is related to your server, so might be outside our software's control. Are you using a Docker/Singularity container? Or installing it from source?

Could you explain your second question in more detail? I'm not sure what you mean.

Also, could you confirm that you're mapping against GENCODE's genome FASTA, and not Ensembl's? If you're using Ensembl's genome FASTA, you would need the GRCm39_Ensembl_rmsk_TE.gtf instead.

Thanks.

Oh, I installed it using the source code on a supercomputer cluster. When working, I need to work on the computing node. The computing node cannot be connected to the Internet. I don't know if it's because it requires an Internet connection during use. Thank you very much for your answer and wish you success in your research.

olivertam commented 1 month ago

Hi,

Thanks for your clarification. The software should not need to access the internet, so I'm surprised to see the message. Admittedly, all the computing nodes that we have tested on have internet access, so that has never been an issue for us.

Could you provide the command line that you used for your run, and we can see if we can replicate your issue on a computer with no internet access?

Thanks.

l1y1y commented 1 month ago

Hi,

Thanks for your clarification. The software should not need to access the internet, so I'm surprised to see the message. Admittedly, all the computing nodes that we have tested on have internet access, so that has never been an issue for us.

Could you provide the command line that you used for your run, and we can see if we can replicate your issue on a computer with no internet access?

Thanks.

Okay, I'll send you my script. Could you please help me take a look at it? Thank you very much.

!/bin/bash

SBATCH --job-name=te_jobs

SBATCH --partition=bingxing

SBATCH --nodes=1

SBATCH --ntasks=1

SBATCH --mem=110G

SBATCH --cpus-per-task=1

SBATCH --output=all_teoutput%j.txt

SBATCH --error=all_teerror%j.txt

source /public/software/apps/anaconda3/5.2.0/etc/profile.d/conda.sh

treatment_samples1="/public/home/zzs000213/zgsj/CRA002561/CRR129819/output.bam" treatment_samples2="/public/home/zzs000213/zgsj/CRA002561/CRR129820/output.bam" treatment_samples3="/public/home/zzs000213/zgsj/CRA002561/CRR129821/output.bam" treatment_samples4="/public/home/zzs000213/zgsj/CRA002561/CRR129822/output.bam" control_samples1="/public/home/zzs000213/zgsj/CRA002561/CRR129823/output.bam" control_samples2="/public/home/zzs000213/zgsj/CRA002561/CRR129824/output.bam" control_samples3="/public/home/zzs000213/zgsj/CRA002561/CRR129825/output.bam" control_samples4="/public/home/zzs000213/zgsj/CRA002561/CRR129826/output.bam"

gene_gtf="/public/home/zzs000213/zgsj/Mus_musculus.GRCm39.112.gtf" te_gtf="/public/home/zzs000213/zgsj/GRCm39_Ensembl_rmsk_TE.gtf" project_name="all_sample_nosort_test"

echo "Begin execution TEtranscripts..."

TEtranscripts --format BAM --mode multi -t $treatment_samples1 $treatment_samples2 $treatment_samples3 $treatment_samples4 \ -c $control_samples1 $control_samples2 $control_samples3 $control_samples4 \ --GTF $gene_gtf --TE $te_gtf --project $project_name

echo "TEtranscripts Execution completed"

conda deactivate

olivertam commented 1 month ago

Hi,

That is really odd. I did a test run with similar parameters on a computer not connected to the internet, and it ran fine. Right now, I can't replicate your error, nor do I have a clear idea why it's doing what it's doing. Have you tried running it again? I wonder if there was a temporary issue, or if it's reproducible. Did you install TEtranscripts via conda? Just curious why you have to activate the base conda environment (where typically you would create a separate environment for a software to prevent conflicts).

Thanks.

l1y1y commented 1 month ago

Hi,

That is really odd. I did a test run with similar parameters on a computer not connected to the internet, and it ran fine. Right now, I can't replicate your error, nor do I have a clear idea why it's doing what it's doing. Have you tried running it again? I wonder if there was a temporary issue, or if it's reproducible. Did you install TEtranscripts via conda? Just curious why you have to activate the base conda environment (where typically you would create a separate environment for a software to prevent conflicts).

Thanks.

Sorry, I had some things to deal with which delayed me for a while and I didn't reply to you in time. Regarding this issue, I tried running it twice but encountered the same error. Thank you for your reply. I canceled the conda environment once and activating conda is a bad habit of mine.

olivertam commented 1 month ago

Hi,

I'm afraid that I can't reproduce the error, and I don't know the exact cause. Can you confirm that none of the output was generated (e.g. ${project_name}.cntTable)? I looked more into it, and it appears to be an error stemming from SLURM trying to gather cluster parameters and putting them in an influxdb (which might be what is inaccessible). It might be worth passing on the error message to your SLURM admins:

slurmstepd: error: acct_gather_profile/influxdb _send_data: curl_easy_perform failed to send data (discarded). Reason: Couldn't connect to server.

Thanks

l1y1y commented 1 month ago

Hi,

I'm afraid that I can't reproduce the error, and I don't know the exact cause. Can you confirm that none of the output was generated (e.g. ${project_name}.cntTable)? I looked more into it, and it appears to be an error stemming from SLURM trying to gather cluster parameters and putting them in an influxdb (which might be what is inaccessible). It might be worth passing on the error message to your SLURM admins:

slurmstepd: error: acct_gather_profile/influxdb _send_data: curl_easy_perform failed to send data (discarded). Reason: Couldn't connect to server.

Thanks

Yes, you're right. I'm in communication with the administrator of my server, and they will arrange for an engineer to check if it's a server issue. After the check, I'll try running it again and will update you with the results. Thank you for your assistance.

l1y1y commented 3 weeks ago

hi I'm very sorry, it's the problem with my server, not the software. Thank you very much for your help, thank you.