nextstrain / mpox

Nextstrain build for mpox virus
https://nextstrain.org/mpox
MIT License
39 stars 16 forks source link

Add submission date to the INRB sequences #243

Closed j23414 closed 2 months ago

j23414 commented 2 months ago

Description of proposed changes

This is an fixup and update of PR https://github.com/nextstrain/mpox/pull/242

Add submission date to the INRB sequences as 2024-04-12 (preprint date), so that they show up properly in the "submission_recency" coloring, etc.

add_ids.pl ```perl #! /usr/bin/env perl use strict; use warnings; my @TMPIDS=(); for my $i ("TMP0000" .. "TMP0099") { push @TMPIDS, $i; } my $i=0; while(<>){ if(/>(.*)/){ my $header=$1; print ">$TMPIDS[$i++]"; print "|2024-04-12"; # <= New print "|INRB"; print "|Africa"; print "|Democratic Republic of the Congo"; print "|$header\n"; }else{ print; } } ```
view commands ```bash perl add_ids.pl ingest/submission01_mpox47_2024.fasta > fixedheaders.fasta ./ingest/bin/fasta-to-ndjson \ --fasta fixedheaders.fasta \ --fields genbank_accession submitted authors region country strain host ocountry division collected \ --exclude ocountry \ > ingest/data/inrb.ndjson ```

I ran a check of nextstrain build ingest to make sure the rule "curate" completed successfully before submitting the PR.

Related issue(s)

Checklist

j23414 commented 2 months ago

Tried submitting a test run at: https://github.com/nextstrain/mpox/actions/runs/8898724896