mapleforest / HaploMerger2

40 stars 6 forks source link

error XHM_haploMerger.pl line 833 #4

Open chklopp opened 7 years ago

chklopp commented 7 years ago

While testing on a genome of interest hm.batchB3.haplomerger does not produce a correct output and in the _B3.haploMerger.log I get the following error

HM_haploMerger.pl line 833. Use of uninitialized value in length at ../bin/XHM_haploMerger.pl line 833. Use of uninitialized value in length at ../bin/XHM_haploMerger.pl line 833. Use of uninitialized value in length at ../bin/XHM_haploMerger.pl line 833. Use of uninitialized value in length at ../bin/XHM_haploMerger.pl line 833. ...

The beginning of the file seem OK

Ch+

mapleforest commented 7 years ago

Dear Sir,

It is strange. It appears there is no definition of the output unincorporated sequence,

but I can not pinpoint the problem with current information.

Can you send me all the log files pertinent to hm.batchB?

Have you successfully passed both example1 and 2?

You can use faDnaPolishing.pl to check if there are strange squences/characters in the genomic sequences.

Do you have more than 2000 scaffolds in your assembly? If so, more file handles should be assigned (see the manual).

在 2017/5/17 16:52, chklopp 写道:

While testing on a genome of interest hm.batchB3.haplomerger does not produce a correct output and in the _B3.haploMerger.log I get the following error

HM_haploMerger.pl line 833. Use of uninitialized value in length at ../bin/XHM_haploMerger.pl line 833. Use of uninitialized value in length at ../bin/XHM_haploMerger.pl line 833. Use of uninitialized value in length at ../bin/XHM_haploMerger.pl line 833. Use of uninitialized value in length at ../bin/XHM_haploMerger.pl line 833. ...

The beginning of the file seem OK

Ch+

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mapleforest/HaploMerger2/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/AOtnAFYFeYF3CG-kmq6BTyKFSxONTj0xks5r6rU2gaJpZM4NdjvI.

--

best regards,

黄盛丰 Shengfeng Huang 中山大学生命科学学院 School of life sciences, Sun Yat-sen university hshengf2@mail.sysu.edu.cn http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46


本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。 This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.

chklopp commented 7 years ago

When I try example 1

sh ./hm.batchB1.initiation_and_all_lastz genome

Running command : ./hm.batchB1.initiation_and_all_lastz genome

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

These files/directories are going to be output : genomex.fa.gz - a copy of the genome sequence file genome.fa(.gz) genome.sizes - the size of each scaffold sequences genomex.sizes - the size of each scaffold sequences genome.seq/.fa and .nib - fasta files and nib files for scaffold sequences genomex.seq/.fa and .nib - fasta files and nib files for scaffold sequences ** genome.genomex.result/raw.axt/.axt, .axt., and *.log - raw lastz result files and their log files log files: _B1.initiation.log - log file _B1.all_lastz.log - log file

more _B1.all_lastz.log

========== Start at Wed May 17 12:02:33 CEST 2017 ../bin/HM_all_lastz_mThreads.pl --Species genome genomex --notrivial --radius=5000 --threads=1 --identity=80 --targetSize=50000000 --querySize=1600000000 --Force --Delete

Set to OVER-WRITING mode! Set to Delete mode! Species included: genome genomex checking missing gz-compressed multiple-fasta files ... Every required files seem ok, going on ...

Check existed raw.axt directory ... checking if genome.genomex.result/raw.axt is existing ... cleaning the evironment anyway ... Thread number is set to ... 1 ! Target file size is ... 50000000 ! Query file size is ... 1600000000 ! Is_noself status is ... 0 ! --notrivial status is ... 1 ! --radius for notrivial function is ... 5000 ! --unmask for covert lowcase letters to upcase letters ... ! go to lastz all to all step ...

Finished splitting the target fasta file ... Finished splitting the query fasta file ... lastz genome to genomex ...

Finished lastz all genome to all genomex !

========== Time used = 0 seconds or 0 hours.

sh ./hm.batchB2.chainNet_and_netToMaf genome

Running command : ./hm.batchB2.chainNet_and_netToMaf genome

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

These files/directories are going to be output : genome.genomex.result/all.chain.gz - raw chain file, gzip compressed genome.genomex.result/all.tbest.chain.gz - target-best chain file genome.genomex.result/all.tbest.net - target-best net file genome.genomex.result/all.rbest.chain.gz - reciprocal best chain file, gzip compressed genome.genomex.result/all.rbest.net.gz - reciprocal best net file, gzip compressed genome.genomex.result/zeroMinSpace.rbest.net.gz - the zeroMinSpace net file genome.genomex.result/mafFiltered.net.gz - the net file for mafFiltered.net.maf.gz genome.genomex.result/mafFiltered.net.maf.gz - maf alignments, all scaffolds in one file, deleted genome.genomex.result/mafFiltered.net.maf.tar.gz - maf alignments, one file per scaffold, for viewing log files: genome.genomex.result/*.log - log files for HM_axtChainRecipBestNet.pl _B2.axtChainRecipBestNet.log - log file _B2.netToMaf.log - log file

NOTE THAT you may delete the directory genome.genomex.result/raw.axt after having sucessfully finished this script!


NOTE THAT if this script finished sucessfully, you can delete the directory genome.genomex.result/raw.axt to save tens of gigabases' disk space !


But

ls -ltr genome.genomex.result/raw.axt total 0

mapleforest commented 7 years ago

Yes, this is the problem. It is possible the hm.batchB could not find lastz and chainNet because they are not in the path.

but I am not sure.

Log files (including those under genome.genomex.result) contain all info.

You can pack up the whole example1 directory and send it me. I would like to look into it.

在 2017/5/17 18:05, chklopp 写道:

When I try example 1

sh ./hm.batchB1.initiation_and_all_lastz genome

        Running command : ./hm.batchB1.initiation_and_all_lastz genome

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

These files/directories are going to be output : genomex.fa.gz - a copy of the genome sequence file genome.fa(.gz) genome.sizes - the size of each scaffold sequences genomex.sizes - the size of each scaffold sequences genome.seq/*.fa and /.nib - fasta files and nib files for scaffold sequences genomex.seq//.fa and /.nib - fasta files and nib files for scaffold sequences ** genome.genomex.result/raw.axt//.axt, /.axt./, and .log - raw lastz result files and their log files log files: _B1.initiation.log - log file _B1.all_lastz.log - log file

more _B1.all_lastz.log

========== Start at Wed May 17 12:02:33 CEST 2017 ../bin/HM_all_lastz_mThreads.pl --Species genome genomex --notrivial --radius=5000 --threads=1 --identity=80 --targetSize=50000000 --querySize=1600000000 --Force --Delete

Set to OVER-WRITING mode! Set to Delete mode! Species included: genome genomex checking missing gz-compressed multiple-fasta files ... Every required files seem ok, going on ...

Check existed raw.axt directory ... checking if genome.genomex.result/raw.axt is existing ... cleaning the evironment anyway ... Thread number is set to ... 1 ! Target file size is ... 50000000 ! Query file size is ... 1600000000 ! Is_noself status is ... 0 ! --notrivial status is ... 1 ! --radius for notrivial function is ... 5000 ! --unmask for covert lowcase letters to upcase letters ... ! go to lastz all to all step ...

Finished splitting the target fasta file ... Finished splitting the query fasta file ... lastz genome to genomex ...

Finished lastz all genome to all genomex !

========== Time used = 0 seconds or 0 hours.

sh ./hm.batchB2.chainNet_and_netToMaf genome

        Running command : ./hm.batchB2.chainNet_and_netToMaf genome

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

These files/directories are going to be output : genome.genomex.result/all.chain.gz - raw chain file, gzip compressed genome.genomex.result/all.tbest.chain.gz - target-best chain file genome.genomex.result/all.tbest.net - target-best net file genome.genomex.result/all.rbest.chain.gz - reciprocal best chain file, gzip compressed genome.genomex.result/all.rbest.net.gz - reciprocal best net file, gzip compressed genome.genomex.result/zeroMinSpace.rbest.net.gz - the zeroMinSpace net file genome.genomex.result/mafFiltered.net.gz - the net file for mafFiltered.net.maf.gz genome.genomex.result/mafFiltered.net.maf.gz - maf alignments, all scaffolds in one file, deleted genome.genomex.result/mafFiltered.net.maf.tar.gz - maf alignments, one file per scaffold, for viewing log files: genome.genomex.result/*.log - log files for HM_axtChainRecipBestNet.pl _B2.axtChainRecipBestNet.log - log file _B2.netToMaf.log - log file

NOTE THAT you may delete the directory genome.genomex.result/raw.axt after having sucessfully finished this script!


NOTE THAT if this script finished sucessfully, you can delete the directory genome.genomex.result/raw.axt to save tens of gigabases' disk space !


But

ls -ltr genome.genomex.result/raw.axt total 0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mapleforest/HaploMerger2/issues/4#issuecomment-302045830, or mute the thread https://github.com/notifications/unsubscribe-auth/AOtnACYjbzh-LdVOXhSjBxr3LU7XvVm5ks5r6sZkgaJpZM4NdjvI.

--

best regards,

黄盛丰 Shengfeng Huang 中山大学生命科学学院 School of life sciences, Sun Yat-sen university hshengf2@mail.sysu.edu.cn http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46


本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。 This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.

chklopp commented 7 years ago

Hi,

I have added the needed software to the path and now HM2 runs correctly on the example. But on my genome I get to following error in _B3.pathFinder_preparation.log

========== Start at Mon May 29 09:22:18 CEST 2017 ../bin/HM_pathFinder_preparation.pl --Species genome genomex --Force --Delete --scoreScheme=score --filter=20000

Species included: genome genomex Set the scoring scheme to score, and set the filter score/ali_len to 20000 . Set to OVER-WRITING mode! Produce tsc/qsc_ids, total id are 408 ! Read the zeroMinSpace.rbest.net.gz file and do some basic node filtering ... Total nodes are 2741 ; deleted nodes are 41; add-up score counts are 234. To perform advanced node filtering and to produce the final data set of nodes ... 2741 left before advanced filtering. Set filter score/ali_len to 20000 for node advanced filtering ... 846 low-scored nodes have been filtered (deletion status value = -2)! 0 embeded nodes have been filtered (deletion status value = -4)! 1895 nodes left after advanced filtering (no mirror). Recreate the perfect mirror alignments (nodes) ... 1582 nodes left (with mirror). To produce scaffold portion data structure ... Cut the scaffold into portions ... 5 target scaffolds have no alignment hit ! 5 query scaffolds have no alignment hit ! Appending infomation of Ns and lowcases to nodes ... Output the scaffold infomation to genome.genomex.result/hm.scaffolds. Output the scaffold portion infomation to genome.genomex.result/hm.sc_portions. Output the node (alignment block) infomation to genome.genomex.result/hm.nodes. Finished preparation for pathFinder.pl.

========== Time used = 10 seconds or 0.00277777777777778 hours.

r_preparation.pl line 898, <$gzFH> line 1245. Use of uninitialized value within %sc_names in array element at ../bin/HM_pathFinder_preparation.pl line 898, <$gzFH> line 1245. Use of uninitialized value within %sc_names in array element at ../bin/HM_pathFinder_preparation.pl line 898, <$gzFH> line 1245. substr outside of string at ../bin/HM_pathFinder_preparation.pl line 900, <$gzFH> line 1245. Use of uninitialized value in transliteration (tr///) at ../bin/HM_pathFinder_preparation.pl line 901, <$gzFH> line 1245. Use of uninitialized value in transliteration (tr///) at ../bin/HM_pathFinder_preparation.pl line 902, <$gzFH> line 1245. Use of uninitialized value within %sc_names in array element at ../bin/HM_pathFinder_preparation.pl line 898, <$gzFH> line 1245. substr outside of string at ../bin/HM_pathFinder_preparation.pl line 900, <$gzFH> line 1245. Use of uninitialized value in transliteration (tr///) at ../bin/HM_pathFinder_preparation.pl line 901, <$gzFH> line 1245. Use of uninitialized value in transliteration (tr///) at ../bin/HM_pathFinder_preparation.pl line 902, <$gzFH> line 1245. Use of uninitialized value within %sc_names in array element at ../bin/HM_pathFinder_preparation.pl line 898, <$gzFH> line 1245. ...

Where can it come from?

chklopp commented 7 years ago

The error is coming from the header lines of the fasta file. The header lines of my genome fasta file had some text after the identifier. When I remove the text HM2 runs fine.

mapleforest commented 7 years ago

HM2 includes a perl script to clean up the fasta file ("FaDnaPolishing.pl").

Normal fasta file directly comes from the de novo assembler is clean,

but in case further unexpected problems, FaDnaPolishing.pl could be invocated to clean up the file for safety.

在 2017/5/29 16:54, chklopp 写道:

The error is coming from the header lines of the fasta file. The header lines of my genome fasta file had some text after the identifier. When I remove the text HM2 runs fine.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mapleforest/HaploMerger2/issues/4#issuecomment-304609648, or mute the thread https://github.com/notifications/unsubscribe-auth/AOtnAIvttDUKK8ndqVKFjPk4vXe4gwPsks5r-oe6gaJpZM4NdjvI.

--

best regards,

黄盛丰 Shengfeng Huang 中山大学生命科学学院 School of life sciences, Sun Yat-sen university hshengf2@mail.sysu.edu.cn http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46


本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。 This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.