shendurelab / LACHESIS

The LACHESIS software, as described in Nature Biotechnology (http://dx.doi.org/10.1038/nbt.2727)
Other
76 stars 33 forks source link

Lachesis failed at the step of LoadDenovoCLMSFromSAMs. #16

Open lamz138138 opened 8 years ago

lamz138138 commented 8 years ago

Hi, everyone!

Lachesis is a grate software to link, order scaffolds. After test the software with the demo, I failed to run the script with my data. It failed at the step of LoadDenovoCLMSFromSAMs. According to the script of "ChromLinkMatrix.cc", it said "an internal inconsistency in the SAM file. Maybe the file is truncated or it has an incorrect header". Considering it may be caused by sort problem, I had re-run the sort step with picard by sorting the file with reads name. However, it was still failed, so I had following question:

1) What does "internal inconsistency and incorrect header" mean? 2) What 'c2.tid == c1.mtid` was used to? 3) According to the log, it said "Filling 24 clusters", but I had assigned "CLUSTER_N = 21", why?

Any suggestion would be grateful! Error and log were list below:

\1. Error:

Lachesis: ChromLinkMatrix.cc:2221: void LoadDeNovoCLMsFromSAM(const string&, const string&, const ClusterVec&, std::vector<ChromLinkMatrix*>): Assertion `c2.tid == c1.mtid' failed

\2. Log:

Filling 24 clusters with Hi-C data from SAM file /myPath/preprocessSAMs/test.bam (dot = 1M alignments)

tangerzhang commented 7 years ago

Hi, Have you solved this problem? I have the same issue when using LACHESIS. I am wondering how to solved this problem. Could you please provide any clue about this? Thanks!

lamz138138 commented 7 years ago

Hi,

I remembered the problem was result from un-sorted bam. Maybe it is a problem of SortSam.jar in picard (so I change it to SortSame). Or you had to sort the bam before the step of "samtools flagstat" (I had a new file called *consistent.bam). It's a long time since I solve this problem and I couldn't get the files. So I would give the relate code that was used to revised the file of PreprocessSAMs.pl blow, Hope it would help.

By the way, you could sorted bam file before "samtools flagstat", then continue the following process if it failed without repeat the previously steps.

export reSite="AAGCTT"
export bedtoolsPath="/tools/bedtools2-2.20.1/bin/bedtools"
export samtoolsPath="/tools/samtools-0.1.19/samtools"
export picardPath="/tools/picard/picard.jar"
export memSize="64G"
export filterInconsistentPath="/tools/NGS/bin/filterInconsistentPairReads.pl"
sed -i "/^my \$RE_site/s#'.*'#'$reSite'#; \
    /^my \$make_bed_around_RE_site_pl/s#'.*'#'$outputDir/preprocessSAMs/make_bed_around_RE_site.pl'#; \
    /^my \$bedtools/s#'.*'#'$bedtoolsPath'#; \
    /^my \$samtools/s#'.*'#'$samtoolsPath'#; \
    /^#my \$mem/s#\".*\"#\"$memSize\"#;/^#my \$mem/s/#//; 
    /^#my \$picard_head/s#\/.*\/#$picardPath#;/^#my \$picard_head/s/#//;
    /#run_cmd( \"\${picard_head}SortSam.jar/s/SortSam.jar/ SortSam/;
    /#run_cmd( \"\${picard_head} SortSam/s/^#//;
    /#run_cmd( \"\${picard_head}MarkDuplicates.jar/s/MarkDuplicates.jar/ MarkDuplicates/;
    /#run_cmd( \"\${picard_head} MarkDuplicates/s/^#//;
    /run_cmd( \"\${picard_head} MarkDuplicates/s#true\"#true READ_NAME_REGEX=null TMP_DIR=$outputDir/preprocessSAMs/temp MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000\"#;
    /^my $nodups/s#\"\"#\".nodups\"#;
    /run_cmd( \"\$samtools view -F12/a run_cmd( \"\$filterInconsistentPath \$head.REduced\$nodups.paired_only.bam | \$samtools view -bS - -o \$head.consistent.bam\" );
    /run_cmd( \"\$samtools flagstat/s#\$head.REduced.*flagstat#\$head.consistent.bam > \$head.consistent.flagstat#;
    /^\/\//s#^//#\#//#" PreprocessSAMs.pl
 sed -i "/^my \$samtools/a my \$filterInconsistentPath = '$filterInconsistentPath';" PreprocessSAMs.pl
 sed -i "/^\/\//s#^//#\#//#" make_bed_around_RE_site.pl
tangerzhang commented 7 years ago

Hi, Thanks for your message. I will try to solve this problem with your script :-)

Epigenetics-Wang commented 5 years ago

Hi,

I remembered the problem was result from un-sorted bam. Maybe it is a problem of SortSam.jar in picard (so I change it to SortSame). Or you had to sort the bam before the step of "samtools flagstat" (I had a new file called *consistent.bam). It's a long time since I solve this problem and I couldn't get the files. So I would give the relate code that was used to revised the file of PreprocessSAMs.pl blow, Hope it would help.

By the way, you could sorted bam file before "samtools flagstat", then continue the following process if it failed without repeat the previously steps.

export reSite="AAGCTT"
export bedtoolsPath="/tools/bedtools2-2.20.1/bin/bedtools"
export samtoolsPath="/tools/samtools-0.1.19/samtools"
export picardPath="/tools/picard/picard.jar"
export memSize="64G"
export filterInconsistentPath="/tools/NGS/bin/filterInconsistentPairReads.pl"
sed -i "/^my \$RE_site/s#'.*'#'$reSite'#; \
    /^my \$make_bed_around_RE_site_pl/s#'.*'#'$outputDir/preprocessSAMs/make_bed_around_RE_site.pl'#; \
    /^my \$bedtools/s#'.*'#'$bedtoolsPath'#; \
    /^my \$samtools/s#'.*'#'$samtoolsPath'#; \
    /^#my \$mem/s#\".*\"#\"$memSize\"#;/^#my \$mem/s/#//; 
    /^#my \$picard_head/s#\/.*\/#$picardPath#;/^#my \$picard_head/s/#//;
    /#run_cmd( \"\${picard_head}SortSam.jar/s/SortSam.jar/ SortSam/;
    /#run_cmd( \"\${picard_head} SortSam/s/^#//;
    /#run_cmd( \"\${picard_head}MarkDuplicates.jar/s/MarkDuplicates.jar/ MarkDuplicates/;
    /#run_cmd( \"\${picard_head} MarkDuplicates/s/^#//;
    /run_cmd( \"\${picard_head} MarkDuplicates/s#true\"#true READ_NAME_REGEX=null TMP_DIR=$outputDir/preprocessSAMs/temp MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000\"#;
    /^my $nodups/s#\"\"#\".nodups\"#;
    /run_cmd( \"\$samtools view -F12/a run_cmd( \"\$filterInconsistentPath \$head.REduced\$nodups.paired_only.bam | \$samtools view -bS - -o \$head.consistent.bam\" );
    /run_cmd( \"\$samtools flagstat/s#\$head.REduced.*flagstat#\$head.consistent.bam > \$head.consistent.flagstat#;
    /^\/\//s#^//#\#//#" PreprocessSAMs.pl
 sed -i "/^my \$samtools/a my \$filterInconsistentPath = '$filterInconsistentPath';" PreprocessSAMs.pl
 sed -i "/^\/\//s#^//#\#//#" make_bed_around_RE_site.pl

Hi,@lamz138138, I got the same problem, and i try to sovle the problem by using your code. I have a question about where can i get the script of filterInconsistentPairReads.pl? Is this problem caused by not sorting the sam file ? I am wondering how to solved this problem. Could you please provide any clue about this? Thanks!

lamz138138 commented 5 years ago

@Epigenetics-Wang Hi, I remember it was caused by failing to sort the bam, that is the output bam isn't sorted by name as expected. And I think filterInconsistentPairReads.pl is my script, but I forget what is used for (it's a pity I can't find this scripts now).

I had solved this problem by running the PrepocessSAMs.pl step by step, then figure out whether bam is sorted by name. Then if it still failed to work, try to find read that make "c2.tid == c1.mtid" is false (read name is different which may result from failing to sort bam), then write filterInconsistentPairReads.pl.

Hope this useful.

Epigenetics-Wang commented 5 years ago

@Epigenetics-Wang Hi, I remember it was caused by failing to sort the bam, that is the output bam isn't sorted by name as expected. And I think filterInconsistentPairReads.pl is my script, but I forget what is used for (it's a pity I can't find this scripts now).

I had solved this problem by running the PrepocessSAMs.pl step by step, then figure out whether bam is sorted by name. Then if it still failed to work, try to find read that make "c2.tid == c1.mtid" is false (read name is different which may result from failing to sort bam), then write filterInconsistentPairReads.pl.

Hope this useful.

Hi,@lamz138138, Thanks for your message. I will try to solve this problem with your advice :-)