yezhengSTAT / FreeHiC

FreeHi-C pipeline for high fidelity Hi-C data simulation.
MIT License
8 stars 9 forks source link

FreeHiC Bugs current #6

Open Ruiqing-Wang opened 2 years ago

Ruiqing-Wang commented 2 years ago

I have encountered some bugs and failed to run them on several machines. The first problem is the dynamic library, but this is not the main problem. You can see the difficulties encountered in running demo during simulation.

projDir="/home/wrq/software/FreeHiC"

fastqFile="ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1215nnn/GSM1215593/suppl/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2" #"${projDir}/data/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2"

fastqFile="${projDir}/demoData/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2" ref="${projDir}/data/PlasmoDB-9.0_Pfalciparum3D7_Genome.fasta" refrag="${projDir}/data/MboI_resfrag_plasmodium.bed" simuName="results" outDir="${projDir}/results" summaryFile="${projDir}/results/summary/${simuName}_FreeHiC.summary"

bwa="bwa" samtools="samtools" bedtools="bedtools"

train=0 simulate=1 postProcess=1 coreN=30 mismatchN=3 gapN=1 mismatchP="" gapP="" chimericP="" simuN=2400000 readLen=40 resolution=10000 lowerBound=$((resolution*2)) refragU=500 ligateSite="GATCGATC" ##################

*****

II. Simulating interaction read sequences

*****

II. Simulating! RE fragment = /home/wrq/software/FreeHiC/data/MboI_resfrag_plasmodium.bed Fragment interaction file = /home/wrq/software/FreeHiC/results/rawDataTraining/s2_validPairs/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2.validPairs.fragFreq Output directory = /home/wrq/software/FreeHiC/results/results/simuSequence Output file name = GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2 Total number of interactions to simulate = 2400000 Read length to simulate = 40 Maximum distance away from restriction site = 500 Path to bedtools = bedtools Path to reference genome fasta file = /home/wrq/software/FreeHiC/data/PlasmoDB-9.0_Pfalciparum3D7_Genome.fasta Path to summary file = /home/wrq/software/FreeHiC/results/summary/results_FreeHiC.summary Verbose = True Read in restriction fragment dictionary! Read in interaction frequency! Convert dictionary and list into C structure! Sample fragment interaction! Read in trained parameters! 10129112 777700 51566 28458 Simulate end 1! 0 500000 1000000 1500000 2000000 Get fasta from the reference genome! Add mutation! Traceback (most recent call last): File "scripts/freeHiC_cython.pyx", line 450, in freeHiC_cython.simulation rid = int(ridTmp) ValueError: invalid literal for int() with base 10: '23::chr7:164903-164943(-)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "scripts/freeHiC_cython.pyx", line 452, in freeHiC_cython.simulation rid = int(ridTmp[0:-3]) ValueError: invalid literal for int() with base 10: '23::chr7:164903-164943' Exception ignored in: 'freeHiC_cython.simulation' Traceback (most recent call last): File "scripts/freeHiC_cython.pyx", line 452, in freeHiC_cython.simulation rid = int(ridTmp[0:-3]) ValueError: invalid literal for int() with base 10: '23::chr7:164903-164943' II. Simulation Post-processing: 1 - Alignment! (We assume the reads length are the same within the same fastq file.)...... Start alignment of read end 1 Step1.1 - BWA alignment [main] Version: 0.7.17-r1188 [main] CMD: bwa samse /home/wrq/software/FreeHiC/data/PlasmoDB-9.0_Pfalciparum3D7_Genome.fasta /home/wrq/software/FreeHiC/results/results/simuProcess/s1_alignment/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_1.sai /home/wrq/software/FreeHiC/results/results/simuSequence/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_1.fastq [main] Real time: 13.558 sec; CPU: 12.875 sec Choose not to rescue chimeric reads or the chimeric reads have been trimmed before alignment! Start alignment of read end 2 cat: /home/wrq/software/FreeHiC/results/results/simuSequence/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_2.fastq: No such file or directory Step1.1 - BWA alignment [bwa_seq_open] fail to open file '/home/wrq/software/FreeHiC/results/results/simuSequence/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_2.fastq' : No such file or directory [fread] Unexpected end of file Choose not to rescue chimeric reads or the chimeric reads have been trimmed before alignment! [bam_sort_core] merging from 0 files and 30 in-memory blocks... II. Simulation Post-processing: 2 - Joining read-ends! Read 1 file = /home/wrq/software/FreeHiC/results/results/simuProcess/s1_alignment/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_1.sam Read 2 file = /home/wrq/software/FreeHiC/results/results/simuProcess/s1_alignment/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2_2.sam Output file = /home/wrq/software/FreeHiC/results/results/simuProcess/s1_alignment/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2.sam Summary of alignment results = True Summary file path = /home/wrq/software/FreeHiC/results/summary/results_FreeHiC.summary Training = 0 Verbose = True ------------Begin joining ends from read 1 file and read 2 file------------ Traceback (most recent call last): File "/home/wrq/software/FreeHiC/scripts/joinEnds.py", line 165, in with pysam.Samfile(args.readEnd1, "rb") as readEnd1, pysam.Samfile(args.readEnd2, "rb") as readEnd2: File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False II. Simulation Post-processing: 3 - Categorize read-pairs! Paired-ended read alignmeng file does not exist! II. Simulation Post-processing: 4 - Binning! awk: fatal: cannot open file `/home/wrq/software/FreeHiC/results/results/simuProcess/s2_validPairs/GSM1215593_trimmedAndFiltered-TROPHOZOITES-XL-AGGG-L2.validPairs' for reading (No such file or directory)

yezhengSTAT commented 2 years ago

Hello Ruiqing, Apologize for the delayed rely! I think the issue of Traceback (most recent call last): File "scripts/freeHiC_cython.pyx", line 450, in freeHiC_cython.simulation rid = int(ridTmp) ValueError: invalid literal for int() with base 10: '23::chr7:164903-164943(-)' is due to line 430 in freeHiC_cython.pyx:

call(bedtools + ' getfasta -fi ' + genome + ' -bed ' + outdir + '/' + fileName + '.readPosMut1 -name -s -tab -fo ' + outdir + "/" + fileName + '.readSeqMut1', shell = True)

May I know which version of bedtools did you use? bedtools seems to updated its getfasta functions several times. Therefore, we recommended using an older version (<=2.25.0) for stability.

Hope it can address your problem.

Best, Ye

Ruiqing-Wang commented 2 years ago

When I tried to modify code, we found more error. Thanks, It could work now.