vpc-ccg / circminer

A sensitive and fast tool for circular RNA detection from RNA-Seq data
GNU General Public License v3.0
10 stars 4 forks source link

segmentation fault (core dumped) #17

Open nliorni opened 1 year ago

nliorni commented 1 year ago

Greetings, I cloned the latest version of circminer, completed the installation (make) and builded index for hg38 reference genome. I keep getting the same error after the circRNA detection is completed:

/usr/bin/bash: line 1: 305631 Segmentation fault (core dumped)

The "Segmentation fault (core dumped)" pops up also when calling "circminer --help", like:

... For more details and command line options run "circminer --help" Segmentation fault (core dumped)

How can I solve the issue? Thanks in advance for help

fhach commented 1 year ago

seems like an installation error. @yenyilin, what do you think?

nliorni commented 1 year ago

i get the same error with the bioconda installation

yenyilin commented 1 year ago

circminer --help

Since 1de79ac. Seems like the destructor of FASTQParser.

yenyilin commented 1 year ago

Greetings, I cloned the latest version of circminer, completed the installation (make) and builded index for hg38 reference genome. I keep getting the same error after the circRNA detection is completed:

/usr/bin/bash: line 1: 305631 Segmentation fault (core dumped)

The "Segmentation fault (core dumped)" pops up also when calling "circminer --help", like:

... For more details and command line options run "circminer --help" Segmentation fault (core dumped)

How can I solve the issue? Thanks in advance for help

(1) Can you confirm that you finish the 1600 pairs of reads in the test sample properly? (2) In that case do you mind sharing the fastq file with us? (2.1) Otherwise can you share your commands (number of reads, providing a fastq file or device for getting piping)? (3) While it should not affect your result, we understand the segmentation fault is annoying. The easiest workaround is to only running line 16-23 of fastq_parser.cpp when current_record != NULL.

yenyilin commented 1 year ago

@nliorni : We can not reproduce the segmentation fault after the detection step yet, and we appreciate it if you can provide more information regarding this scenario. The current workaround will be to add if ( NULL != current_record){ line 16-25 }

in fastq_parser.cpp so circminer will not try to empty unallocated current_record.

We will update once we ensure that we fix the reproduced segmentation fault after the detection step. Thank you.

nliorni commented 1 year ago

good morning! Thank you for the kind answer. The command I ran is:

circminer --verbosity 1 --thread 20 -r hg38.fa -g gencode36.gtf -1 sample_1.fastq -2 sample_2.fastq --output /path/to/output

how can I share the fastq with you? they are gb in size.

The segmentation fault error is the following:

Tue Feb 28 09:57:48 2023 [INFO] Number of threads: 1 Tue Feb 28 09:57:48 2023 [INFO] Input file type: Paired-end Tue Feb 28 09:57:49 2023 [INFO] Kmer size obtained from index: 20 Tue Feb 28 09:57:49 2023 [INFO] Loading GTF file... Tue Feb 28 09:58:24 2023 [INFO] Completed! (CPU time: 35.62s; Real time: 35.85s) Tue Feb 28 09:58:24 2023 [INFO] Genome index type: Full Tue Feb 28 09:58:24 2023 [INFO] Starting read extraction Tue Feb 28 09:58:24 2023 [INFO] + Loading genome index... Tue Feb 28 09:58:33 2023 [INFO] + Completed! (CPU time: 8.51s; Real time: 8.63s) Tue Feb 28 09:58:33 2023 [INFO] + Loading genome sequence... Tue Feb 28 09:58:36 2023 [INFO] + Completed! (CPU time: 3.22s; Real time: 3.24s) Tue Feb 28 09:58:36 2023 [INFO] + Starting pseudo-alignment (Round 1) Tue Feb 28 11:21:25 2023 [INFO] + Completed round 1! (CPU time: 3855.33s; Real time: 4968.88s) Tue Feb 28 11:21:25 2023 [INFO] + Loading genome index... Tue Feb 28 11:21:32 2023 [INFO] + Completed! (CPU time: 6.97s; Real time: 7.03s) Tue Feb 28 11:21:32 2023 [INFO] + Loading genome sequence... Tue Feb 28 11:21:34 2023 [INFO] + Completed! (CPU time: 2.47s; Real time: 2.49s) Tue Feb 28 11:21:34 2023 [INFO] + Starting pseudo-alignment (Round 2) Tue Feb 28 12:18:26 2023 [INFO] + Completed round 2! (CPU time: 3367.59s; Real time: 3411.76s) Tue Feb 28 12:18:26 2023 [INFO] + Loading genome index... Tue Feb 28 12:18:33 2023 [INFO] + Completed! (CPU time: 6.47s; Real time: 6.53s) Tue Feb 28 12:18:33 2023 [INFO] + Loading genome sequence... Tue Feb 28 12:18:35 2023 [INFO] + Completed! (CPU time: 2.46s; Real time: 2.48s) Tue Feb 28 12:18:35 2023 [INFO] + Starting pseudo-alignment (Round 3) Tue Feb 28 13:26:56 2023 [INFO] + Completed round 3! (CPU time: 4078.03s; Real time: 4100.55s) Tue Feb 28 13:26:56 2023 [INFO] Starting circRNA detection Tue Feb 28 13:26:56 2023 [INFO] + Sorting remaining read mappings using GNU sort... Tue Feb 28 13:27:18 2023 [INFO] + Completed! (CPU time: 0.00s; Real time: 22.18s) Tue Feb 28 13:27:18 2023 [INFO] + Loading genome sequence... Tue Feb 28 13:27:22 2023 [INFO] + Completed! (CPU time: 3.82s; Real time: 3.84s) Tue Feb 28 13:29:12 2023 [INFO] + Loading genome sequence... Tue Feb 28 13:29:15 2023 [INFO] + Completed! (CPU time: 3.39s; Real time: 3.41s) Tue Feb 28 13:30:34 2023 [INFO] + Loading genome sequence... Tue Feb 28 13:30:37 2023 [INFO] + Completed! (CPU time: 3.39s; Real time: 3.41s) /usr/bin/bash: line 1: 558958 Segmentation fault (core dumped) circminer --verbosity 1 -r /data/reference_data/hg38/genome/hg38_ucsc_filtered.fa -g /data/reference_data/hg38/annotations/gencode.v36.basic.annotation.gtf -1 /data2/analysis2/storlazzi/fastq/sratools/SRX669021/SRX669021_1.fastq -2 /data2/analysis2/storlazzi/fastq/sratools/SRX669021/SRX669021_2.fastq --output /data3/pipeline_data/output/circminer_GEO/results/SRX669021/circminer/SRX669021

so I don't know if the detection is actually fully completed.

I will attach the fastqc report for this fastq pair.

Sorry to bother you, and thanks again for the help.

nl

SRX669021_1_fastqc.zip

SRX669021_2_fastqc.zip

yenyilin commented 1 year ago

Thank you for the information. We will start testing using SRR1797219 from SRX669021 and keep you updated.

yenyilin commented 1 year ago

In the meantime I will assume you use https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_36/gencode.v36.basic.annotation.gtf.gz as your gtf (Basic gene annotation, CHR) and a corresponding fasta file as your hg38_ucsc_filtered.fa(autosomes plus X, Y, and MT) . Let me know if you use alternative GTF so I can better recover your scenarios.

yenyilin commented 1 year ago

We tried to run the current release circminer using 47,209,075 paired-end reads of SRR1797219 and it finished properly. Can you remind us of the gcc version in your system?

nliorni commented 1 year ago

Good morning @yenyilin. Yes, I used that reference annotation file, as you can see from the command I ran. The version of gcc I have is 8.5.0. Ok, I understand that you ran circminer on the same sample I have (SRX669021) and it does work. I don't know what the problem might be. Thank you again for your help. nl

yenyilin commented 1 year ago

@nliorni I will try it again on 8.5.0.

In the meantime I hope you don't mind me sharing some suggestions (which lead to more work on your side). In process_circ.cpp you already finished line 306 but can not proceed to line 325. (1) Do you mind running our suggested patch in line 16 to 25 of fastq_parser.cpp and tell us the results?

if ( NULL != current_record){
for (int i = 0; i < threadCount; ++i) {
        free(current_record[i].rname);
        free(current_record[i].seq);
        free(current_record[i].rcseq);
        free(current_record[i].comment);
        free(current_record[i].qual);
        free(current_record[i].rqual);
    }
    // free(current_record);
    delete[] current_record;
}

(2) If (1) still failed, can you help tail -n 1600 of both read files and run it again using these 1,600 pairs?

nliorni commented 1 year ago

@yenyilin Good morning, sorry for the late response. I will try the suggested patch and let you know. I am currently running circminer on a full dataset, and some samples seems to present this problem, as you can see from the attached log. Also, now is keeping loading the genome sequence. I'll wait and keep this updated. Thanks so much again, nl nohup.out.txt

fhach commented 1 year ago

@nliorni Can you provide what type of resources your system has? RAM, CORES. Are you running using slurm or any type of queue management system where you can restrict the resources?

nliorni commented 1 year ago

Greetings @fhach, I am running the analysis without queue management systems on a workstation with 252gb RAM and 64 cores

nliorni commented 1 year ago

hello @yenyilin, sorry for the delayed response. I applied the suggested patch and we have more samples working this time, but still 44 on 122 samples of the dataset went through the same error presented in this issue. We also tried the tail -n 1600 suggestion on the first sample failing:

tail -n 1600 /data2/analysis2/fastq/sratools/SRX669025/SRX669025_1.fastq > /data2/analysis2/fastq/sratools/SRX669025/1600tail_SRX669025_1.fastq 1600_fix_log_669025.txt

it seemed to work without throwing any error, as you can see from the attached log.

Thank you again for all the help, by the way.

yenyilin commented 1 year ago

@nliorni I only have GCC 8.4.0 and 9.3.0 that they both worked. In these cases I am wondering some system-level issues like Faraz mentioned. (1) From your log it seems that you ran multiple jobs sequentially instead of simultaneous processing (such as parallel or xargs). Am I correct? (2) Can you confirm that segmentation fault is reproducible for the same dataset? In short, the same dataset will always crash circminer. This information is important for us to eliminate the possibility of multiple jobs competing memory at the node.

nliorni commented 1 year ago

@yenyilin 1) yes, I am running the job through a bash script sequentially. I had a first try using the Snakemake WMS, but I resorted to a bash script as soon as I got the presented error to exclude it was a Snakemake problem. 2) yes, the segmentation fault is reproducible for the same dataset: running the bash script again on the same dataset results in the same number of samples that throw an error. Thank you again for your help. nl