thegenemyers / DAMASKER

Module to determine where repeats are and make soft-masks of said
9 stars 10 forks source link

datander creates las files that LAcheck fails on #13

Open dgordon562 opened 5 years ago

dgordon562 commented 5 years ago

Hi, Gene,

David Gordon here. Chris Dunn and I have been working on a datander problem and he says it is now time to bring it to your attention. We reproduced it with the latest versions of datander and LAcheck directly from you.

LAcheck fails like this:

LAcheck -v raw_reads TAN.raw_reads.148.las TAN.raw_reads.148: Duplicate LAs (2650063 vs 2650063) TAN.raw_reads.148: Not OK, see stderr

Earlier datander was run like this (below). The database for this is 42G and I don't know how to make a smaller test case. Let me know how I can help further.

Best wishes, David Gordon

~/damasker/datander -v -w1 -h1 -e0.99 -s1000 -P. raw_reads.145 raw_reads.146 raw_reads.147 raw_reads.148

Indexing raw_reads.145

Kmer count = 399,815,313 Using 11.92Gb of space Index occupies 5.96Gb

Comparing raw_reads.145 to itself

 6,920,248 seed hits (4.324522e-11 of matrix)
     6,510 confirmed hits (4.068155e-14 of matrix)

LAsort ./datander.25232/raw_reads.145.T@.las LAmerge TAN.raw_reads.145.las ./datander.25232/raw_reads.145.T@.S.las

Indexing raw_reads.146

Kmer count = 399,796,247 Using 11.91Gb of space Index occupies 5.96Gb

Comparing raw_reads.146 to itself

 6,785,328 seed hits (4.240709e-11 of matrix)
     6,468 confirmed hits (4.042385e-14 of matrix)

LAsort ./datander.25232/raw_reads.146.T@.las LAmerge TAN.raw_reads.146.las ./datander.25232/raw_reads.146.T@.S.las

Indexing raw_reads.147

Kmer count = 399,822,315 Using 11.92Gb of space Index occupies 5.96Gb

Comparing raw_reads.147 to itself

 6,774,151 seed hits (4.233091e-11 of matrix)
     5,788 confirmed hits (3.616856e-14 of matrix)

LAsort ./datander.25232/raw_reads.147.T@.las LAmerge TAN.raw_reads.147.las ./datander.25232/raw_reads.147.T@.S.las

Indexing raw_reads.148

Kmer count = 399,791,637 Using 11.91Gb of space Index occupies 5.96Gb

Comparing raw_reads.148 to itself

 6,842,248 seed hits (4.276282e-11 of matrix)
     6,089 confirmed hits (3.805515e-14 of matrix)

LAsort ./datander.25232/raw_reads.148.T@.las LAmerge TAN.raw_reads.148.las ./datander.25232/raw_reads.148.T@.S.las

mycecilia commented 5 years ago

I had the exact same error at LAcheck.

  TAN.raw_reads.124: Duplicate LAs (2432338 vs 2432338)
  TAN.raw_reads.124: Not OK, see stderr

According to the discussion on daligner issue#42, I tried to run datander with parameters -l1000 -s100, and the blocks causing error ran through successfully.

I'm wondering if this would help solve your issue. Also I'm wondering if this is the correct fix.

dgordon562 commented 5 years ago

thank you, Shiyu Chen!

On Sun, Jun 9, 2019 at 1:11 PM Shiyu Chen notifications@github.com wrote:

I had the exact same error at LAcheck.

TAN.raw_reads.124: Duplicate LAs (2432338 vs 2432338) TAN.raw_reads.124: Not OK, see stderr

According to the discussion on daligner issue#42 https://github.com/thegenemyers/DALIGNER/issues/42, I tried to run datander with parameters -l1000 -s100, and the blocks causing error ran through successfully.

I'm wondering if this would help solve your issue. Also I'm wondering if this is the correct fix.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/thegenemyers/DAMASKER/issues/13?email_source=notifications&email_token=ACX6XC4KGRQSIDS3AZHWKMLPZVPV5A5CNFSM4HUF66DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXIRHXI#issuecomment-500241373, or mute the thread https://github.com/notifications/unsubscribe-auth/ACX6XC2TJFNFX3C2TFGUGGTPZVPV5ANCNFSM4HUF66DA .

pb-cdunn commented 5 years ago

@mycecilia, thanks for spotting this.

@dgordon562, please click "Close" on this issue.

dgordon562 commented 5 years ago

Hi, @mycecilia,

What do I put in the cfg file so that datander is run with the parameters you specified? I tried putting them with: pa_daligner_option = -h1 -e.99 -w1 -l1000 -s100 and run_datander.sh still was:

datander -v -w1 -h1 -e0.99 -P. raw_reads

What did you do?

Thanks! David

dgordon562 commented 5 years ago

@mycecilia: Or did you manually edit run_datander.sh? If you did this, did you then wait until the next datander failure on a different chunk, edit that run_datander.sh, wait again, ... over and over? Is there any way to specify -l1000 -s100 to be used for all?

mycecilia commented 5 years ago

@dgordon562 By the time FALCON is runing tan-runs, all the script templates were already generated with the parameters provided by the cfg file. I went into a script folder in 0-rawreads which has a subfolder for each chunk, and each chunk subfolder has a run_datander.sh script. I manually modified the problematic chunks then re-ran fc_run.py fc_run.cfg. That script folder seemed to be deleted after tan-runs were done, so I can't say what the exact name is.

Or maybe it's easier to just delete the existing files and folders generated by the old falcon run, and new script templates will be generated with your current fc_run.cfg file.