tanlongzhi / dip-c

Tools to analyze Dip-C (or other 3C/Hi-C) data
61 stars 18 forks source link

Error in dip-c seg and intermediate files #21

Open bioyuyang opened 5 years ago

bioyuyang commented 5 years ago

Hi Tan,

This's Yuyang from Tsinghua Uni, Beijing. Hope you have a nice holiday.

I just followed the "Typical Workflow" in my server and got trouble at the very beginning step.

../seqtk-master/seqtk mergepe SRR7226685_1.fastq SRR7226685_2.fastq | ../lianti-master/lianti trim - |../bwa-master/bwa mem -Cp ../hg19.fa - | samtools view -uS |../sambamba-0.6.8-linux-static sort -o aln.bam /dev/stdin ./dip-c seg -v snps/NA12878.txt.gz aln.bam | gzip -c > phased.seg.gz

It throw an error in the second step.

[M::seg] pass 2: read 24000000 alignments, last at chrX:33183726 [M::seg] pass 2: read 24100000 alignments, last at chrX:45099699 [M::seg] pass 2: read 25000000 alignments, last at chrX:149610521 [M::seg] pass 2: read 25100000 alignments, last at chrY:13801064 [M::seg] pass 2: read 25200000 alignments, last at chr9_gl000198_random:71282 [M::seg] pass 2: read 25300000 alignments, last at chrUn_gl000216:27250 [M::seg] pass 2: read 25400000 alignments, last at chrUn_gl000220:139949 [M::seg] pass 2: read 25500000 alignments, last at chrUn_gl000226:14258 [M::seg] pass 2: read 25600000 alignments, last at [M::seg] pass 2: read 26800000 alignments, last at [M::seg] pass 2: read 26900000 alignments, last at [M::seg] pass 2: read 27000000 alignments, last at [M::seg] pass 2: cleaning 2230534 candidate reads [M::seg] pass 2 done: read 27005726 alignments; kept 1815407 candidate reads (6.72% of alignments) Traceback (most recent call last): File "./dip-c", line 130, in main() File "./dip-c", line 42, in main return_value = seg.seg(sys.argv[1:]) File "/home/DAILY_WORK/LYY/dip-c-master/seg.py", line 129, in seg for pileup_column in bam_file.pileup(snp_chr, snp_locus - 1, snp_locus): File "pysam/libcalignmentfile.pyx", line 1314, in pysam.libcalignmentfile.AlignmentFile.pileup (pysam/libcalignmentfile.c:16452) File "pysam/libchtslib.pyx", line 675, in pysam.libchtslib.HTSFile.parse_region (pysam/libchtslib.c:11863) ValueError: invalid contig `1

By the way, could you mind uploading some key intermediate files? It would make the pipeline easy to follow and also for debugging. For example, in the "Interactive Visualization of 3D Genomes" section, cell.3dg is used in the whole section to make the pretty figures. Moreover, the 3D reconstruction process seems a little bit tricky as you also showed in the Fig. S8 in your Science paper. Do you have any suggestions to gain a reasonable simulated 3D structure?

Thanks so much for your help! Yuyang

tanlongzhi commented 5 years ago

Hi Yuyang,

I'll take a look at your error as soon as possible.

For an example .3dg file, there's already a FTP link in README.md; but it's not showing up because GitHub doesn't support FTP links. Here I've pasted it below for your convenience:

ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3271nnn/GSM3271352/suppl/GSM3271352_gm12878_06.impute3.round4.clean.3dg.txt.gz

The corresponding GEO accession contains final files like this for all single cells, as well as all intermediate files starting from raw.con.gz. However, the two earliest files, aln.bam and phased.seg.gz haven't been provided because of their large size.

tanlongzhi commented 5 years ago

Your error seems to come from a discrepancy in chromosome naming between your genome file (chr1 in your hg19.fa) and the SNP file you used (1 in snps/NA12878.txt.gz). You must change one of them to match the other.

The importance of chromosome name matching has been mentioned in an earlier comment for this repo, and another comment for the companion repo hickit.

bioyuyang commented 5 years ago

Ok. I will change the file and be careful in the following steps. Thanks so much for the quick respsonse or I will continue to get stuck.


发件人: Longzhi Tan notifications@github.com 发送时间: 2018年11月28日 22:36:52 收件人: tanlongzhi/dip-c 抄送: bioyuyang; Author 主题: Re: [tanlongzhi/dip-c] Error in dip-c seg and intermediate files (#21)

Your error seems to come from a discrepancy in chromosome naming between your genome file (chr1 in your hg19.fa) and the SNP file you used (1 in snps/NA12878.txt.gz). You must change one of them to match the other.

The importance of chromosome name matching has been mentioned in an earlier commenthttps://github.com/tanlongzhi/dip-c/issues/13#issuecomment-424156876 for this repo, and another commenthttps://github.com/tanlongzhi/dip-c/issues/13#issuecomment-424448646 for the companion repo hickit.

― You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/tanlongzhi/dip-c/issues/21#issuecomment-442468474, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AlwYGLknFwmfTroxry8QLKkSCdXQ5mlUks5uzp-EgaJpZM4Y3luD.

liubinnk1 commented 3 years ago

Hi Tan,

I run the dip-c seg command and got the issues:

The messages is as following: Traceback (most recent call last): File "/THL8/home/liubin/software/dip-c-master/dip-c", line 130, in main() File "/THL8/home/liubin/software/dip-c-master/dip-c", line 42, in main return_value = seg.seg(sys.argv[1:]) File "/THL8/home/liubin/software/dip-c-master/seg.py", line 115, in seg seg_data.clean() File "/THL8/home/liubin/software/dip-c-master/classes.py", line 204, in clean for name in self.reads.keys(): RuntimeError: dictionary changed size during iteration

tanlongzhi commented 3 years ago

Hi @liubinnk1, please see my reply to your identical question in the other thread. Best, Tan