Closed yangliyan1991 closed 5 years ago
Hi @yangliyan1991,
Thanks for using our pipeline. It would be helpful to see the output of head
on your mapped read file, the output of ls
on your chromosome FASTA folder, and the verbose output of makedb
from when you attempted to make it. @acgtun may have some input on the WALT issue with additional information.
I am confused: did you try to use WALT, have issues with the index, and use bsmap
instead?
@yangliyan1991
makedb
should be very fast for 15 Mb genome file. Could you give us the output when you ran makedb
. Thanks.
Hi @bdecato @acgtun, Thanks for the responses. I used bsmap before so I tried to use methcounts on the transferred data. Since methcounts did not seem to work properly, I thought it might work if I realign the reads using WALT and then call the methylation rate. Then I met the issue with the index.
The head of my mapped file is as follows:
Y 3633180 3633330 ST-E00317:490:HCLNWCCXY:1:2103:27691:15426 0 - TACACTAACCAAATCAAATATAATAAACCCCACAATCCTTAATCACACCAATCTAAAATCACTTATCAAACAAACTTATCCTATAACAAACTTAATATAAACATCTAAAATTAAAATCCTCCAATCCACAAACATCATCCATATACTAAT JJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA
Y 3633505 3633655 ST-E00317:490:HCLNWCCXY:1:2202:20080:31037 0 - TGTTGTTGTTTGTTTTTAATGTTTATTATAGGATTGTTAGGAGTTTAGATTTTTTAGGTATAGTTTTGAAGGTAGTGGTGGGTATTTTTGATGTTTTTGATTTTTTAAAGATTAAGATAATGGAGGTTTAGTTAGTGAATTTAATTTTAG JJJJJJJJJJJJJJJJFFJJJJJJJJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJF7FJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA
Y 3633514 3633664 ST-E00317:490:HCLNWCCXY:1:1118:4584:50568 0 - ATTTGAGAATGTTGTTGTTTGTTTTTAATGTTTATTATAGGATTGTTAGGAGTTTAGATTTTTTAGGTATAGTTTTGAAGGTAGTGGTGGGTATTTTTGTTGTTTTTGATTTTTTAAAGATTAAGATAATGGAGGTTTAGTTAGTGAATT AAAF7FF-F7FJJ7FJAFJJFJJJJJJJJJJJAFFFFFJJJJ7AFJJJJJ<--<-7FFJ<FFJJJJFJJJJJFJJFAFFJJFFJAJJF<JAFF7FFA<F-<AJAFJJF-FFJJJFJFJ<<FFAFJ<FA-AJA-AF--7-7-<FFAF--FF
Y 3633524 3633674 ST-E00317:490:HCLNWCCXY:1:1212:17543:69889 0 + CTAAACCTCCATTATCTTAATCTTTAAAAAATCAAAAACATCAAAAATACCCACCACTACCTTCAAAACTATACCTAAAAAATCTAAACTCCTAACAATCCTATAATAAACATTAAAAACAAACAACAACATTCTCAAATAATCTATATC FFJAAFFAJJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJFJFJFJJJJJFJJAJJ<JJJJJJJJJJJJJFJJJJJJJJJJJJJFFJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJFFFAA
Y 3633524 3633674 ST-E00317:490:HCLNWCCXY:1:1212:17645:70346 0 - GATATAGATTATTTGAGAATGTTGTTGTTTGTTTTTAATGTTTATTATAGGATTGTTAGGAGTTTAGATTTTTTAGGTATAGTTTTGAAGGTAGTGGTGGGTATTTTTGATGTTTTTGATTTTTTAAAGATTAAGATAATGGAGGTTTAG AAFFFJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJFFJJJJJJJJJJJJJFJJJJJJJJJJJJJ<JJAJJFJJJJJFJFJFJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJAFFAAJFF
Y 3633524 3633674 ST-E00317:490:HCLNWCCXY:1:2220:31456:26659 0 - GATATAGATTATTTGAGAATGTTGTTGTTTGTTTTTAATGTTTATTATAGGATTGTTAGGAGTTTAGATTTTTTAGGTATAGTTTTGAAGGTAGTGGTGGGTATTTTTGATGTTTTTGATTTTTTAAAGATTAAGATAATGGAGGTTTAG AAFFFJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJFFJJJJJJJJJJJJJFJJJJJJJJJJJJJ<JJAJJFJJJJJFJFJFJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJAFFAAJFF
Y 3638874 3639024 ST-E00317:490:HCLNWCCXY:1:2214:14123:48775 0 - TAGGAGGTTGTTAGAAAGTTGATGTTAAATTTTTATTGGTAAAATTTAAATGATAGAAAGTTATGTTTTTTTTTTATGATATGGTTATTTAGAAATGATTGAATAAATGGTGTTGGAATTTAAGTAAGATATTTAAAAAGGGTAGGATAG AAFFFJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJAJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJFJ<JJJJJJJJJJJJJJJJJJJJJJJJJJJJJF<FJJJFF
Y 3638874 3639024 ST-E00317:490:HCLNWCCXY:1:2214:14336:47738 0 - TAGGAGGTTGTTAGAAAGTTGATGTTAAATTTTTATTGGTAAAATTTAAATGATAGAAAGTTATGTTTTTTTTTTATGATATGGTTATTTAGAAATGATTGAATAAATGGTGTTGGAATTTAAGTAAGATATTTAAAAAGGGTAGGATAG AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
The command I used is:
methcounts -v -c /Bio/Database/Species/Animal/Drosophila_melanogaster/FlyBase/r6.18/WGBS_reference/ref_genome/analysis/ -o test.out test.mr.dremove
It output nothing before I killed it.
The folder of the fa files I used is as follows:
total 132M
-rw-r--r--. 1 root root 23M Dec 26 13:01 2L.fa
-rw-r--r--. 1 root root 25M Dec 26 13:01 2R.fa
-rw-r--r--. 1 root root 27M Dec 26 13:01 3L.fa
-rw-r--r--. 1 root root 31M Dec 26 13:01 3R.fa
-rw-r--r--. 1 root root 1.3M Dec 26 13:01 4.fa
-rw-r--r--. 1 root root 23M Dec 26 13:01 X.fa
-rw-r--r--. 1 root root 3.5M Dec 26 13:01 Y.fa
The output of ls
on the chromosome FASTA folder I used to make an index is as follows:
total 15M
-rw-r--r--. 1 yangliyan users 12M Jan 3 08:36 1.fa
-rw-r--r--. 1 yangliyan users 3.5M Jan 3 08:38 2.fa
These are the files I truncated from the fa files I used from the above methylation calling to test the approximate time that makedb
will use.
Both of the files I used here and in the above methylation calling contain two lines, the first of which is like >2L
and the second is the sequence.
The command I used to make the index is:
makedb -c ./test -o test.dbindex
The output of makedb
is:
[IDENTIFYING CHROMS] [DONE]
chromosome files found (approx size):
./test/1.fa (12.00Mbp)
./test/2.fa (4.00Mbp)
[BIULD INDEX FOR FORWARD STRAND (C->T)]
[READING CHROMOSOMES]
The output just stopped there and output nothing more before I killed it.
@yangliyan1991
Through the output of makedb
, I could not know why it took so long.
I think I see the bug. Stay tuned.
On Jan 4, 2018, at 5:38 PM, Haifeng Chen notifications@github.com<mailto:notifications@github.com> wrote:
@yangliyan1991https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_yangliyan1991&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=2ti3K9K1XjkPszCPHB82Aw&m=DuIokOWGEzyj9a36KlcP1uugyNU68OxtgVsPVPTyYmY&s=lIlXPlcDoFwlGtaib9sCB9JGVjjFQS-FORtXouaOk5M&e= Through the output of makedb, I could not know the reason why it took so long.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_smithlabcode_methpipe_issues_118-23issuecomment-2D355449818&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=2ti3K9K1XjkPszCPHB82Aw&m=DuIokOWGEzyj9a36KlcP1uugyNU68OxtgVsPVPTyYmY&s=4jHTJSHe4_sSS7pnhwqPSMZ7aJpBrJSOvoxmy4i-I1U&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHBBgbvzeQYAQi-2DoC1tABERmVQhvUixlks5tHX0SgaJpZM4RSm4w&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=2ti3K9K1XjkPszCPHB82Aw&m=DuIokOWGEzyj9a36KlcP1uugyNU68OxtgVsPVPTyYmY&s=RJqvMTIZlPpfyqnFWJjNvffm2zEzXK78MW6NZgyPivA&e=.
@yangliyan1991https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_yangliyan1991&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=2ti3K9K1XjkPszCPHB82Aw&m=DuIokOWGEzyj9a36KlcP1uugyNU68OxtgVsPVPTyYmY&s=lIlXPlcDoFwlGtaib9sCB9JGVjjFQS-FORtXouaOk5M&e= can you zip up one of your test chromosome files and send it to my email at USC?
On Jan 4, 2018, at 5:00 PM, yangliyan1991 notifications@github.com<mailto:notifications@github.com> wrote:
Hi @bdecatohttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bdecato&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=2ti3K9K1XjkPszCPHB82Aw&m=dFAHsyC7Y5J_kBYGqA33hCtmZtV5O3Wve7gVL-ZWfvg&s=AiqH_svipku9_o6-Cw8wfNITfgDpTUHu-zApwcixRVY&e= @acgtunhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_acgtun&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=2ti3K9K1XjkPszCPHB82Aw&m=dFAHsyC7Y5J_kBYGqA33hCtmZtV5O3Wve7gVL-ZWfvg&s=1aKEObygPxlDQT3i8U1N6yvFBzKwiL47HzB0izBquhk&e=, Thanks for the responses. I used bsmap before so I tried to use methcounts on the transferred data. Since methcounts did not seem to work properly, I thought it might work if I realign the reads using WALT and then call the methylation rate. Then I met the issue with the index.
The head of my mapped file is as follows: Y 3633180 3633330 ST-E00317:490:HCLNWCCXY:1:2223:21673:49549 0 + ATTAGTATATGGATGATGTTTGTGGATTGGAGGATTTTAATTTTAGATGTTTATATTAAGTTTGTTATAGGATAAGTTTGTTTGATAAGTGATTTTAGATTGGTGTGATTAAGGATTGTGGGGTTTATTATATTTGATTTGGTTAGTGTA AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJ Y 3633180 3633330 ST-E00317:490:HCLNWCCXY:1:2223:22384:47755 0 + ATTAGTATATGGATGATGTTTGTGGATTGGAGGATTTTAATTTTAGATGTTTATATTAAGTTTGTTATAGGATAAGTTTGTTTGATAAGTGATTTTAGATTGGTGTGATTAAGGATTGTGGGGTTTATTATATTTGATTTGGTTAGTGTA AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJ Y 3633180 3633330 ST-E00317:490:HCLNWCCXY:1:2103:27691:15426 0 - TACACTAACCAAATCAAATATAATAAACCCCACAATCCTTAATCACACCAATCTAAAATCACTTATCAAACAAACTTATCCTATAACAAACTTAATATAAACATCTAAAATTAAAATCCTCCAATCCACAAACATCATCCATATACTAAT JJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA Y 3633505 3633655 ST-E00317:490:HCLNWCCXY:1:2202:20080:31037 0 - TGTTGTTGTTTGTTTTTAATGTTTATTATAGGATTGTTAGGAGTTTAGATTTTTTAGGTATAGTTTTGAAGGTAGTGGTGGGTATTTTTGATGTTTTTGATTTTTTAAAGATTAAGATAATGGAGGTTTAGTTAGTGAATTTAATTTTAG JJJJJJJJJJJJJJJJFFJJJJJJJJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJF7FJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA Y 3633514 3633664 ST-E00317:490:HCLNWCCXY:1:1118:4584:50568 0 - ATTTGAGAATGTTGTTGTTTGTTTTTAATGTTTATTATAGGATTGTTAGGAGTTTAGATTTTTTAGGTATAGTTTTGAAGGTAGTGGTGGGTATTTTTGTTGTTTTTGATTTTTTAAAGATTAAGATAATGGAGGTTTAGTTAGTGAATT AAAF7FF-F7FJJ7FJAFJJFJJJJJJJJJJJAFFFFFJJJJ7AFJJJJJ<--<-7FFJ<FFJJJJFJJJJJFJJFAFFJJFFJAJJF<JAFF7FFA<F-<AJAFJJF-FFJJJFJFJ<<FFAFJ<FA-AJA-AF--7-7-<FFAF--FF Y 3633524 3633674 ST-E00317:490:HCLNWCCXY:1:1212:17543:69889 0 + CTAAACCTCCATTATCTTAATCTTTAAAAAATCAAAAACATCAAAAATACCCACCACTACCTTCAAAACTATACCTAAAAAATCTAAACTCCTAACAATCCTATAATAAACATTAAAAACAAACAACAACATTCTCAAATAATCTATATC FFJAAFFAJJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJFJFJFJJJJJFJJAJJ<JJJJJJJJJJJJJFJJJJJJJJJJJJJFFJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJFFFAA Y 3633524 3633674 ST-E00317:490:HCLNWCCXY:1:1212:17645:70346 0 - GATATAGATTATTTGAGAATGTTGTTGTTTGTTTTTAATGTTTATTATAGGATTGTTAGGAGTTTAGATTTTTTAGGTATAGTTTTGAAGGTAGTGGTGGGTATTTTTGATGTTTTTGATTTTTTAAAGATTAAGATAATGGAGGTTTAG AAFFFJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJFFJJJJJJJJJJJJJFJJJJJJJJJJJJJ<JJAJJFJJJJJFJFJFJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJAFFAAJFF Y 3633524 3633674 ST-E00317:490:HCLNWCCXY:1:2220:31456:26659 0 - GATATAGATTATTTGAGAATGTTGTTGTTTGTTTTTAATGTTTATTATAGGATTGTTAGGAGTTTAGATTTTTTAGGTATAGTTTTGAAGGTAGTGGTGGGTATTTTTGATGTTTTTGATTTTTTAAAGATTAAGATAATGGAGGTTTAG AAFFFJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJFFJJJJJJJJJJJJJFJJJJJJJJJJJJJ<JJAJJFJJJJJFJFJFJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJAFFAAJFF Y 3638874 3639024 ST-E00317:490:HCLNWCCXY:1:2214:14123:48775 0 - TAGGAGGTTGTTAGAAAGTTGATGTTAAATTTTTATTGGTAAAATTTAAATGATAGAAAGTTATGTTTTTTTTTTATGATATGGTTATTTAGAAATGATTGAATAAATGGTGTTGGAATTTAAGTAAGATATTTAAAAAGGGTAGGATAG AAFFFJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJAJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJFJ<JJJJJJJJJJJJJJJJJJJJJJJJJJJJJF<FJJJFF Y 3638874 3639024 ST-E00317:490:HCLNWCCXY:1:2214:14336:47738 0 - TAGGAGGTTGTTAGAAAGTTGATGTTAAATTTTTATTGGTAAAATTTAAATGATAGAAAGTTATGTTTTTTTTTTATGATATGGTTATTTAGAAATGATTGAATAAATGGTGTTGGAATTTAAGTAAGATATTTAAAAAGGGTAGGATAG AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
The command I used is: methcounts -v -c /Bio/Database/Species/Animal/Drosophila_melanogaster/FlyBase/r6.18/WGBS_reference/ref_genome/analysis/ -o test.out test.mr.dremove It output nothing before I killed it.
The folder of the fa files I used is as follows: total 132M -rw-r--r--. 1 root root 23M Dec 26 13:01 2L.fa -rw-r--r--. 1 root root 25M Dec 26 13:01 2R.fa -rw-r--r--. 1 root root 27M Dec 26 13:01 3L.fa -rw-r--r--. 1 root root 31M Dec 26 13:01 3R.fa -rw-r--r--. 1 root root 1.3M Dec 26 13:01 4.fa -rw-r--r--. 1 root root 23M Dec 26 13:01 X.fa -rw-r--r--. 1 root root 3.5M Dec 26 13:01 Y.fa
The output of ls on the chromosome FASTA folder I used to make an index is as follows: total 15M -rw-r--r--. 1 yangliyan users 12M Jan 3 08:36 1.fa -rw-r--r--. 1 yangliyan users 3.5M Jan 3 08:38 2.fa These are the files I truncated from the fa files I used from the above methylation calling to test the approximate time that makedb will use. Both of the files I used here and in the above methylation calling contain two lines, the first of which is like >2L and the second is the sequence.
The command I used to make the index is: makedb -c ./test -o test.dbindex
The output of makedb is: [IDENTIFYING CHROMS] [DONE] chromosome files found (approx size): ./test/1.fa (12.00Mbp) ./test/2.fa (4.00Mbp) [BIULD INDEX FOR FORWARD STRAND (C->T)] [READING CHROMOSOMES] The output just stopped there and output nothing more before I killed it.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_smithlabcode_methpipe_issues_118-23issuecomment-2D355444444&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=2ti3K9K1XjkPszCPHB82Aw&m=dFAHsyC7Y5J_kBYGqA33hCtmZtV5O3Wve7gVL-ZWfvg&s=4e8Vkq7pZF6cAQ4SAncv___A3vyYSuGAdnPqb-PtS5U&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHBBge4qomfeCQPQBbHsOo-5F96iTf-5FxILks5tHXQ2gaJpZM4RSm4w&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=2ti3K9K1XjkPszCPHB82Aw&m=dFAHsyC7Y5J_kBYGqA33hCtmZtV5O3Wve7gVL-ZWfvg&s=Yp1KhkCtNkrwUT-GrqEr-6c_PxqbjH6FPH93-nNEt_s&e=.
Hi @andrewdavidsmith, I have mailed you at andrewds@usc.edu
This should be fixed by the commit cf65cf5.
Hi,
I am using methcounts on my data transferred by to-msr. The program has been running for 2 days but output nothing. I have included -v in the command but no information has been printed out. My command is as follows:
methcounts -c /Bio/Database/Species/Animal/Drosophila_melanogaster/FlyBase/r6.18/WGBS_reference/ref_genome/analysis -v -o 3.Methyl/CK-1.methyl 2.mr/CK-1.mr
The system I am using is
Linux version 3.10.0-327.el7.x86_64
The data file is from bsmap, transferred using the to-mr command in your pipeline, and then sorted. The file contains 19231748 lines. The chromosomes fa files are stored in the directory as typed in the command.
Similarly, when I tried to use the makedb command in the walt aligner, it had been running for several hours on an 15M fa file before I killed the process. The process is active but it seems that it was not working properly.
I have no idea what is wrong and look forward to your help.
Thanks