[UPDATE] Missing update on generate reads refseq

seqan / raptor

A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences.

https://docs.seqan.de/raptor

Other

52 stars 18 forks source link

[UPDATE] Missing update on generate reads refseq #432

Closed smehringer closed 1 month ago

smehringer commented 1 month ago

I think I never updated the changes I did to generate reads refseq.

The changes introduced here currently read in a file with weights and then generate a number of reads per bin s.t. large user bins have more reads than small ones.

Followup:

[x] Instead of reading in weigths, one could maybe use the file sizes directly?

seqan-actions commented 1 month ago

Documentation preview available at https://docs.seqan.de/preview/seqan/raptor/432

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.94%. Comparing base (e73a40c) to head (50252b1). Report is 2 commits behind head on main.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #432 +/- ## ======================================= Coverage 99.94% 99.94% ======================================= Files 51 51 Lines 1676 1676 Branches 1 1 ======================================= Hits 1675 1675 Misses 1 1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

smehringer commented 1 month ago

@eseiler I'm currently running the script on the new refseq dataset.

Is there a reason why we write 10 Mio files with one read each instead of a single file containing all reads?