Currently the taxonomy and Kraken DB are from 2022-12. This is getting quite old!
Approach for getting up to date:
Come up with a name that shows files were processed under a more recent taxonomy. I'm leaning towards sticking it in the stage path, so instead of processed/ we'll now have processed-2024-06/. All the old files will still be there.
Move human-viruses-raw.tsv, human-viruses.tsv, plus, within dashboard, *.dmp, top_species_counts, and top_species_scratch into a scratch location. Delete hvreads, readlengths, ribofrac, cladecounts and allmatches.
Update download-taxonomy.sh to pull the latest taxonomy at the time we're making the change
Update run.py to use new timestamp-based directories for processed/ and everything downstream from it.
Update prepare-shm-kraken.sh to pull the latest KrakenDB
Currently that's 2024-01-12, but historically these have come out in the early summer so we may be about to see an update
Currently the taxonomy and Kraken DB are from 2022-12. This is getting quite old!
Approach for getting up to date:
Come up with a name that shows files were processed under a more recent taxonomy. I'm leaning towards sticking it in the stage path, so instead of
processed/
we'll now haveprocessed-2024-06/
. All the old files will still be there.Move
human-viruses-raw.tsv
,human-viruses.tsv
, plus, withindashboard
,*.dmp
,top_species_counts
, andtop_species_scratch
into a scratch location. Deletehvreads
,readlengths
,ribofrac
,cladecounts
andallmatches
.Update
download-taxonomy.sh
to pull the latest taxonomy at the time we're making the changeUpdate
run.py
to use new timestamp-based directories forprocessed/
and everything downstream from it.Update
prepare-shm-kraken.sh
to pull the latest KrakenDBReprocess bioprojects as needed