Previously, the original source for a reference file was implicit:
If accession number starts with LRG_, it came from the LRG FTP
archive.
If a download URL is known, it was downloaded from there.
If slice data is known, it was sliced from the NCBI.
If a GI number is known, it was downloaded from the NCBI.
Otherwise, it was uploaded.
In preparation for the removal of GI numbers (#349), this had to be
revisited. We now store the source explicitely in a new source field
on the Reference model. If additional information is needed to
re-fetch the file from this source (e.g., download URL), this is stored
in a new source_data field (always serialized as a string). This
scheme should be both more explicit and more generic.
Subtasks:
[x] Add source and source_data columns.
[x] Populate columns in migration.
[x] Load some example data for migration tests.
[x] Use the columns in the retriever, remove use of old columns.
[x] Use the columns in cache sync, remove use of old columns.
[x] Check use of old columns elsewhere.
[x] Follow-up: remove slice_* and download_url columns and make source NOT NULL. #388 #389
Track source for reference files
Previously, the original source for a reference file was implicit:
LRG_
, it came from the LRG FTP archive.In preparation for the removal of GI numbers (#349), this had to be revisited. We now store the source explicitely in a new
source
field on theReference
model. If additional information is needed to re-fetch the file from this source (e.g., download URL), this is stored in a newsource_data
field (always serialized as a string). This scheme should be both more explicit and more generic.Subtasks:
source
andsource_data
columns.slice_*
anddownload_url
columns and makesource
NOT NULL. #388 #389