pombase / pombase-chado

PomBase code for accessing Chado
MIT License
5 stars 3 forks source link

Training datasets for ML/AI - publication centric #1181

Open ValWood opened 4 months ago

ValWood commented 4 months ago

Create a "publication centric" file containing all entities / annotations (all datatypes) for each publication.

Json?

kimrutherford commented 4 months ago

JSON makes sense. How urgent is this?

ValWood commented 4 months ago

It would be good to have it in a few of weeks I think to keep the ball rolling. I'm meeting the ePMC ML person on Monday if you want to join (forwarded the invite) v

kimrutherford commented 4 months ago

I'll start this on Monday. It might take a couple of days because the existing code needs improving first. A lot was written in a hurry for PomBase v2. Now I've had time (7 years?) to think about it, there are better ways to do things.

Proposed JSON structure (work in progress):

PMID:

kimrutherford commented 4 months ago

From Zoom: make sure to include annotation comments in the output.

ValWood commented 4 months ago

related: https://github.com/pombase/pombase-chado/issues/1185 we'll discuss this on the next call....

kimrutherford commented 2 months ago

After the chat with ePMC a while ago, I'm wondering if it's useful to create a file like this in advance. It sounded like there are particular file formats that each group uses. So perhaps we should create files when asked? Unless it's a especially wacky format I think I could create files on request with a 24 turn-around.

ValWood commented 2 months ago

OK keep this on the back burner.