legumeinfo / ArachisPheno

AraPheno source code for http://arapheno.1001genomes.org
MIT License
0 stars 0 forks source link

Finalize accession format #11

Closed svengato closed 4 years ago

svengato commented 4 years ago

I will run through the migration process one more time when we finalize the [accession id, accession name, replicate id] format (with underscore or whatever).

I have been holding off on this in order to implement it in the next migration event (such as adding model fields for distinguishing private data).

  1. Should I update these this morning, before people start looking at ArachisPheno? I can change them directly in the database for now.
  2. As I understand it, the final format will involve underscores: accession id = PI_152111, accession name = PI_152111, replicate id = PI_152111_1 ?
sdash-github commented 4 years ago

I think Andrew should think about it before making the underscore thing firm. I may not be able to think far ahead and all implications, particularly the DS aspect. My only point is, it is relatively easier to parse and transform to any other format like with space, without space, etc.

On 2020/4/8 9:24 AM, svengato wrote:

I will run through the migration process one more time when we
finalize the [accession id, accession name, replicate id] format
(with underscore or whatever).

I think Andrew should think about it before making the underscore thing firm. I may not be able to think far ahead and all implications, particularly the DS aspect. My only point is, it is relatively easier to parse and transform to any other format like with space, without space, etc.

I have been holding off on this in order to implement it in the next migration event (such as adding model fields for distinguishing private data).

  1. Should I update these this morning, before people start looking at ArachisPheno? I can change them directly in the database for now.

I have already sent the link to them.  They might already be looking at it.  They also might want to look at it during the meeting.

adf-ncgr commented 4 years ago

Well, I guess my overall feeling is that GRIN is the authority on germplasm and pretty clearly demonstrate in URLs like: https://npgsweb.ars-grin.gov/gringlobal/accessiondetail.aspx?accid=PI%20490374 that "PI 490374" is considered correct. It doesn't seem to me that within the context of the xxxPheno Application we need to worry about filenames with accessions in them, so we might as well just stick to canonical form. If we ever need to link to the DS files, it's easy enough to drop the space (less easy to get consistency in the space-altering practices of curators there, but that is a traditional headache we need not try to solve here)

That said, I'd advocate not introducing new spaces in replicate ids (ie we could make them "PI 490374-1" or similar)

svengato commented 4 years ago

It would be nice to avoid spaces when looking up accessions in the REST API.

adf-ncgr commented 4 years ago

Can you elaborate? I imagine they'd be URL-encoded as in the above example from GRIN.

svengato commented 4 years ago

We do use %20 and other encoded characters in the ZBrowse URLs all the time. I guess it will be okay.

svengato commented 4 years ago

Settled on "PI nnnnnn"