Open calkan opened 5 years ago
I upgraded the isInt to accommodate up to unin64 in my little library libgab. Can you do a:
cd libgab
git status
git pull origin master
make clean
make
cd ..
make clean
make
I hope this will not overflow to more than 4 billion fragments, yes -c is the endogenous coverage.
that problem is now gone, I think. I now have this error with ART though:
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc system cmd /mnt/compgen/homes/calkan/projects/ancient/gargammel/art_src_MountRainier/art_illumina -ss HS25 -amp -na -p -i data/70-5-25-40x_a.fa -l 100 -c 1 -qs 0 -qs2 0 -o data/70-5-25-40x_s failed: 134 at ./gargammel.pl line 79.
ok that is probably because data/70-5-25-40x_s file is 917 GB for some reason. Am I doing this wrong?:
./gargammel.pl -c 30 --comp 0.7,0.05,0.25 -l 110 -rl 100 -SS HS25 -o data/70-5-25-40x data/
what I want to get is a total of 30X human genome coverage with 100 bp paired end reads (fragment 110). That should translate to 900M reads (450M pairs) of length 100bp. Of this data set, 70% should be bacterial, 25% endogenous, 5% present-day contamination. That's what I'm trying to get anyway, but I guess I misinterpret the -c parameter.
The ART package cannot take zipped files. Hence we have to use plain files.
Can you do an ls -al in the directory data/70-5-25-40x data/
can you also try to run art on a subset, do you still get the std:bac_alloc?
ls -l data/70-5-25-40x* -rw-rw-r-- 1 calkan compgen 984014037789 Oct 21 22:50 data/70-5-25-40x_a.fa -rw-rw-r-- 1 calkan compgen 91693121260 Oct 20 14:17 data/70-5-25-40x.b.fa.gz -rw-rw-r-- 1 calkan compgen 7782171239 Oct 20 05:07 data/70-5-25-40x.c.fa.gz -rw-rw-r-- 1 calkan compgen 177612870941 Oct 21 08:27 data/70-5-25-40x_d.fa.gz -rw-rw-r-- 1 calkan compgen 78137570969 Oct 20 04:22 data/70-5-25-40x.e.fa.gz -rw-rw-r-- 1 calkan compgen 0 Oct 21 22:50 data/70-5-25-40x_s1.fq
art works well with a small subset, no std:bad_alloc
thanks! I have emailed the developers, let's wait. In the meantime, maybe you can dice up the input using unix split? Very sorry for the trouble.
On Mon, Oct 22, 2018 at 8:46 PM Can Alkan notifications@github.com wrote:
art works well with a small subset, no std:bad_alloc
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/gargammel/issues/4#issuecomment-431933114, or mute the thread https://github.com/notifications/unsubscribe-auth/ACEWo0OOuUbBJEfbIp9VSe6_53J4E7n2ks5unhKegaJpZM4Xu2r9 .
ok. there are _b, _c files as well, should I repeat with them? What happens after that, is the ART output the final output?
normally you just need the _a file. it is the one with the adapter ligated on the deaminated fragments.
ART produces the final output yes.
On Mon, Oct 22, 2018 at 9:12 PM Can Alkan notifications@github.com wrote:
ok. there are _b, _c files as well, should I repeat with them? What happens after that, is the ART output the final output?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/gargammel/issues/4#issuecomment-431942301, or mute the thread https://github.com/notifications/unsubscribe-auth/ACEWo50xh_kRpqkb5nS83v2_BRruKImuks5unhi0gaJpZM4Xu2r9 .
Hi
I am trying to simulate aDNA data at high coverage. I assume the "-c" parameter sets the overall depth of coverage. Is this correct, or does it set the endogenous coverage? I do this:
./gargammel.pl -c 30 --comp 0.7,0.05,0.25 -l 110 -rl 100 -SS HS25 -o data/70-5-25-40x data/
after quite a long time gargammel fails:
.... Produced 2,147,400,000 ERROR: Cannot add thousandSeparator to non-integer 2147500000 system cmd /mnt/compgen/homes/calkan/projects/ancient/gargammel/src/adptSim -f AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATTCGATCTCGTATGCCGTCTTCTGCTTG -s AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTT -l 100 -artp data/70-5-25-40x_a.fa data/70-5-25-40x_d.fa.gz failed: 256 at ./gargammel.pl line 79.