marbl / merqury

k-mer based assembly evaluation
Other
272 stars 19 forks source link

Error when running #47

Closed boucherufl closed 2 years ago

boucherufl commented 3 years ago

I ran Meryl without issues on all fastq files + merged the resulting files. After running merqury as follows:

$MERQURY/merqury.sh ./manatee.meryl /blue/conesa/share/manatee_assembly/ont_polishing/polished_sequences.fasta CBtest

It runs without stdout error but the files are empty and there are errors in the log file. See below:

/home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 88 Copy = 3 ..

Copy = 4 ..

Copy >4 ..

Copy numbers in k-mers found only in asm

No asm2_fa given. Done.

polished_sequences only

Write output

Get asm only for spectra-asm

Plot CBtest.spectra-asm.hist

Rscript /home/christinaboucher/merqury-1.3/plot/plot_spectra_cn.R -f CBtest.spectra-asm.hist -o CBtest.spectra-asm -z CBtest.dist_only.hist [1] "x_max: "

Clean up

Done! cannot remove ‘read.k.polished_sequences.3.meryl’: No such file or directory /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 88: meryl: command not found /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 89: meryl: command not found rm: cannot remove ‘read.k.polished_sequences.4.meryl’: No such file or directory /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 95: meryl: command not found /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 96: meryl: command not found rm: cannot remove ‘read.k.polished_sequences.gt4.meryl’: No such file or directory /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 102: meryl: command not found /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 103: meryl: command not found /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 104: meryl: command not found /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 105: -: syntax error: operand expected (error token is "-") /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 117: meryl: command not found /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 121: meryl: command not found /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 122: meryl: command not found /home/christinaboucher/merqury-1.3/eval/spectra-cn.sh: line 125: meryl: command not found Loading required package: argparse Loading required package: ggplot2 Loading required package: scales Error in [.data.frame(dat_0, , 3) : undefined columns selected Calls: spectra_cn_plot -> [ -> [.data.frame In addition: Warning message: In max(dat[dat[, 1] != "read-total" & dat[, 1] != "read-only" & : no non-missing arguments to max; returning -Inf Execution halted rm: cannot remove ‘polished_sequences.0.meryl’: No such file or directory rm: cannot remove ‘read.k.polished_sequences.0.meryl’: No such file or directory rm: cannot remove ‘read.k.polished_sequences.meryl’: No such file or directory rm: cannot remove ‘manatee.gt0.meryl’: No such file or directory

Insight into correcting would be great.

Thanks. Christina

arangrhie commented 3 years ago

Hello Christina, could you double check if the path to meryl bin is in your $PATH? Seems like meryl is not found.

boucherufl commented 3 years ago

Hi Arang,

Thanks for your reply. I have been still struggling with getting it running properly. Rather than install it locally I have asked the Hipergator support to install it as a module.

I triple checked and meryl is installed correctly. When I type "meryl" the help comes up.

What is disconcerning is that when I write:

$MERQURY/merqury.sh manatee.meryl polished_sequences.fasta CB-test2

I get the following error in the log:

Can't interpret 'manatee.meryl': not a meryl command, option, or recognized input file.

Can't interpret 'polished_sequences.meryl': not a meryl command, option, or recognized input file.

Can you give me some insight?

Best,

Christina


Christina Boucher Associate Professor Computer & Information Science & Engineering Department Herbert Wertheim College of Engineering University of Florida Gainesville, FL 32611 http://www.christinaboucher.com/ Google Scholarhttps://scholar.google.com/citations?user=wpPBcf4AAAAJ&hl=en&citsig=AMstHGQcx72PMDLXmo8GRH2-sYilrgTdjg


From: Arang Rhie @.> Sent: May 27, 2021 9:42 AM To: marbl/merqury @.> Cc: Boucher,Christina A @.>; Author @.> Subject: Re: [marbl/merqury] Error when running (#47)

[External Email]

Hello Christina, could you double check if the path to meryl bin is in your $PATH? Seems like meryl is not found.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_marbl_merqury_issues_47-23issuecomment-2D849645272&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=Q_YlNWCwJuIilBlHQKaBTi6nbeC7V69BqYdS7U8kp3s&s=ge31QD3nX7EYrC_UVwsvaluYCYib19AEMnhLpV8-Hq4&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMLLNFF7WLLT63WQESFPTLDTPZD3VANCNFSM45SKX4BQ&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=Q_YlNWCwJuIilBlHQKaBTi6nbeC7V69BqYdS7U8kp3s&s=q2YdZYBAAAsQQxcehhPamWxwom7VQF-EmA3i7CRLE30&e=.

arangrhie commented 3 years ago

Something is odd here. Could you double check the meryl version installed? Make sure v1.3 release version is installed, for both Merqury and Meryl.

boucherufl commented 3 years ago

Yes, both are installed. Both are versions 1.3, as they were installed a couple of weeks ago.


Christina Boucher Associate Professor Computer & Information Science & Engineering Department Herbert Wertheim College of Engineering University of Florida Gainesville, FL 32611 http://www.christinaboucher.com/ Google Scholarhttps://scholar.google.com/citations?user=wpPBcf4AAAAJ&hl=en&citsig=AMstHGQcx72PMDLXmo8GRH2-sYilrgTdjg


From: Arang Rhie @.> Sent: June 1, 2021 12:33 PM To: marbl/merqury @.> Cc: Boucher,Christina A @.>; Author @.> Subject: Re: [marbl/merqury] Error when running (#47)

[External Email]

Something is odd here. Could you double check the meryl version installed? Make sure v1.3 release version is installed, for both Merqury and Meryl.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_marbl_merqury_issues_47-23issuecomment-2D852266170&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=8-q_N4wb8_xZR4-HKo5PYcOBlkALnDk8KJR-XLyNqKI&s=zQ1OQHTLTuIDZUaygH9udZE3rtxV1RQ9kzpcvOvv9vI&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMLLNFB4XKTNZ4HFFOASXMTTQUDWXANCNFSM45SKX4BQ&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=8-q_N4wb8_xZR4-HKo5PYcOBlkALnDk8KJR-XLyNqKI&s=DuZeDtbGGgy216FKt9RZa6dgJoohtxxsWiZ35B2Kvi4&e=.

arangrhie commented 3 years ago

Hmm, ok, let's check if the manatee.meryl is correctly built. What do you get when running meryl statistics manatee.meryl | head ?

arangrhie commented 3 years ago

If that is giving a reasonable summary, try running spectra-cn yourself: $MERQURY/eval/spectra-cn.sh manatee.meryl polished_sequences.fasta CB-test2 and let me know what the log says.

boucherufl commented 3 years ago

Found 1 command tree.

Number of 21-mers that are:

unique 1345124229 (exactly one instance of the kmer is in the input)

distinct 2255727146 (non-redundant kmer sequences in the input)

present 7479513837 (...)

missing 4395790783958 (non-redundant kmer sequences not in the input)

         number of   cumulative   cumulative     presence

          distinct     fraction     fraction   in dataset

frequency kmers distinct total (1e-6)



Christina Boucher Associate Professor Computer & Information Science & Engineering Department Herbert Wertheim College of Engineering University of Florida Gainesville, FL 32611 http://www.christinaboucher.com/ Google Scholarhttps://scholar.google.com/citations?user=wpPBcf4AAAAJ&hl=en&citsig=AMstHGQcx72PMDLXmo8GRH2-sYilrgTdjg


From: Arang Rhie @.> Sent: June 1, 2021 12:46 PM To: marbl/merqury @.> Cc: Boucher,Christina A @.>; Author @.> Subject: Re: [marbl/merqury] Error when running (#47)

[External Email]

If that is giving a reasonable summary, try running spectra-cn yourself: $MERQURY/eval/spectra-cn.sh manatee.meryl polished_sequences.fasta CB-test2 and let me know what the log says.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_marbl_merqury_issues_47-23issuecomment-2D852279228&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=2OccAjM51r20FI-lLbIqU6AY83wCYspnFoH58uX15To&s=akYzBii0ugonR6_VENIfqfcLlThr-PIrYT7gXtvip0k&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMLLNFCYVTJPAKJO6EPWCFTTQUFHZANCNFSM45SKX4BQ&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=2OccAjM51r20FI-lLbIqU6AY83wCYspnFoH58uX15To&s=27kz6Je7uQJecVLYzMxTulli8Pa_J9VpUDVtET2qgTU&e=.

arangrhie commented 3 years ago

Ok, seems like it is a 'reasonable' meryl database, except the number of kmers seem quite small.

How did you obtained the manatee.meryl? Was this from Illumina reads? What was the sequencing depth? The summary is saying there are 1.3 G unique out of 2.2G distinct k-mers, which seems too small for a 3~4 Gb genome. Wonder if this was obtained from the assembly?

boucherufl commented 3 years ago

The manatee.meryl was obtained from illumina data. I am uncertain about the sequencing depth as I am coming into the project late; i.e., assess the assembly quality to decide on the next steps.

I'd like to just get it running so I can report back to Adam and the other collaborators, even if the assembly quality is low.


Christina Boucher Associate Professor Computer & Information Science & Engineering Department Herbert Wertheim College of Engineering University of Florida Gainesville, FL 32611 http://www.christinaboucher.com/ Google Scholarhttps://scholar.google.com/citations?user=wpPBcf4AAAAJ&hl=en&citsig=AMstHGQcx72PMDLXmo8GRH2-sYilrgTdjg


From: Arang Rhie @.> Sent: June 1, 2021 12:53 PM To: marbl/merqury @.> Cc: Boucher,Christina A @.>; Author @.> Subject: Re: [marbl/merqury] Error when running (#47)

[External Email]

Ok, seems like it is a 'reasonable' meryl database, except the number of kmers seem quite small.

How did you obtained the manatee.meryl? Was this from Illumina reads? What was the sequencing depth? The summary is saying there are 1.3 G unique out of 2.2G distinct k-mers, which seems too small for a 3~4 Gb genome. Wonder if this was obtained from the assembly?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_marbl_merqury_issues_47-23issuecomment-2D852283299&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=QlOxngCto_P8J1dCEKt-bwp6mBDLpX8JvZGpFSfZwL8&s=1x1dyl78Klmc3vw9gkW7dZhoIyvCKX5W_pdjDrShKq8&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMLLNFCY4JZPJJEISV242BDTQUF7NANCNFSM45SKX4BQ&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=QlOxngCto_P8J1dCEKt-bwp6mBDLpX8JvZGpFSfZwL8&s=I8456RHFCALPhuBV4YnJym0pdoauGTo9EO8hT4e1kmM&e=.

boucherufl commented 3 years ago

Hipergator research staff installed. Using their installation, the problem seems to persist....


Christina Boucher Associate Professor Computer & Information Science & Engineering Department Herbert Wertheim College of Engineering University of Florida Gainesville, FL 32611 http://www.christinaboucher.com/ Google Scholarhttps://scholar.google.com/citations?user=wpPBcf4AAAAJ&hl=en&citsig=AMstHGQcx72PMDLXmo8GRH2-sYilrgTdjg


From: Arang Rhie @.> Sent: June 1, 2021 12:53 PM To: marbl/merqury @.> Cc: Boucher,Christina A @.>; Author @.> Subject: Re: [marbl/merqury] Error when running (#47)

[External Email]

Ok, seems like it is a 'reasonable' meryl database, except the number of kmers seem quite small.

How did you obtained the manatee.meryl? Was this from Illumina reads? What was the sequencing depth? The summary is saying there are 1.3 G unique out of 2.2G distinct k-mers, which seems too small for a 3~4 Gb genome. Wonder if this was obtained from the assembly?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_marbl_merqury_issues_47-23issuecomment-2D852283299&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=QlOxngCto_P8J1dCEKt-bwp6mBDLpX8JvZGpFSfZwL8&s=1x1dyl78Klmc3vw9gkW7dZhoIyvCKX5W_pdjDrShKq8&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMLLNFCY4JZPJJEISV242BDTQUF7NANCNFSM45SKX4BQ&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=QlOxngCto_P8J1dCEKt-bwp6mBDLpX8JvZGpFSfZwL8&s=I8456RHFCALPhuBV4YnJym0pdoauGTo9EO8hT4e1kmM&e=.

arangrhie commented 3 years ago

Unfortunately, I don't think the Merqury will give a reasonable analysis with the given .meryl as the sequencing depth seems too low. It looks like 1~3x; with no peak. Might be better to get back and check if there are more Illumina reads available.

For the spectra-cn like analysis, there should be a clear peak seen that distinguishes erroneous vs. true copy number k-mers. meryl histogram manatee.meryl > manatee.hist will give you the histogram, which the above can be estimated.

QV estimation may also fail and under represent the truth if the coverage is not able to cover the full assembly. What was the assembly size?

boucherufl commented 3 years ago

Do you have some test files that I can run to make sure this is an issue with the data and not the install / software?


Christina Boucher Associate Professor Computer & Information Science & Engineering Department Herbert Wertheim College of Engineering University of Florida Gainesville, FL 32611 http://www.christinaboucher.com/ Google Scholarhttps://scholar.google.com/citations?user=wpPBcf4AAAAJ&hl=en&citsig=AMstHGQcx72PMDLXmo8GRH2-sYilrgTdjg


From: Arang Rhie @.> Sent: June 1, 2021 1:11 PM To: marbl/merqury @.> Cc: Boucher,Christina A @.>; Author @.> Subject: Re: [marbl/merqury] Error when running (#47)

[External Email]

Unfortunately, I don't think the Merqury will give a reasonable analysis with the given .meryl as the sequencing depth seems too low. It looks like 1~3x; with no peak. Might be better to get back and check if there are more Illumina reads available.

For the spectra-cn like analysis, there should be a clear peak seen that distinguishes erroneous vs. true copy number k-mers. meryl histogram manatee.meryl > manatee.hist will give you the histogram, which the above can be estimated.

QV estimation may also fail and under represent the truth if the coverage is not able to cover the full assembly. What was the assembly size?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_marbl_merqury_issues_47-23issuecomment-2D852295776&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=LTiWahBjZ9QKyrO2aHPAaDMc-X_I5SVlfFZzKvmtQZQ&s=3vYoiqGGF0e96gmVCTR0K6J7ENE2g_2C0OI4nG-UOwA&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMLLNFCAD763MBY5V5ASA23TQUIFHANCNFSM45SKX4BQ&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=CbX1Lan-HgWvMdp4nYmontxH8QWFbCg-2J8XUE1wMOQ&m=LTiWahBjZ9QKyrO2aHPAaDMc-X_I5SVlfFZzKvmtQZQ&s=4AQGPjhbB6NCw7hy9Jo7HzAElC3y-4gkhRuuftHuqJo&e=.

arangrhie commented 3 years ago

Try this: https://github.com/marbl/merqury#example Let me know if the same error persists!