It's horribly slow but it works.
I manually checked many artifical reads against the genome browser and sequence, coordinates & annotation matched in all cases.
So far I used Evgenia's data (so C. elegans).
Just run ln -s /data/rajewsky/home/mschilli/repo/projects/celegans_evgenia/EA_cel10T.EA_cel10T.WBcel235_81.ribosomal_transcripts.unmapped.wbcel235.sorted.unmapped.anchors.wbcel235.circs.bed.gz wbcel235.circs.bed.gz before running make synthetic_reads.R1.fa.gz or make synthetic_reads.R1.fa.gz since I cannot share these data publicly.
@marvin-jens:
If you don't want a 2nd reference in the tests maybe we can discuss what to base it on to avoid the overhad the the test runs.
I know that for unit tests it has to be much faster & cleaner.
The current code code be improved by me or I could spent some time (next week?) to re-do this using byo to honor your legacy and learn to appreaciate what you did there. ;)
It's horribly slow but it works. I manually checked many artifical reads against the genome browser and sequence, coordinates & annotation matched in all cases. So far I used Evgenia's data (so C. elegans).
Just run
ln -s /data/rajewsky/home/mschilli/repo/projects/celegans_evgenia/EA_cel10T.EA_cel10T.WBcel235_81.ribosomal_transcripts.unmapped.wbcel235.sorted.unmapped.anchors.wbcel235.circs.bed.gz wbcel235.circs.bed.gz
before runningmake synthetic_reads.R1.fa.gz
ormake synthetic_reads.R1.fa.gz
since I cannot share these data publicly.@marvin-jens: If you don't want a 2nd reference in the tests maybe we can discuss what to base it on to avoid the overhad the the test runs. I know that for unit tests it has to be much faster & cleaner. The current code code be improved by me or I could spent some time (next week?) to re-do this using byo to honor your legacy and learn to appreaciate what you did there. ;)