Open ValWood opened 3 months ago
We provide a lexica of gene names in data downloads, but we don't have a list of all alleles
We have this: https://curation.pombase.org/dumps/builds/pombase-build-2024-06-20/misc/all_alleles.tsv
Just need to add synonyms.
Just need to add synonyms.
That's done for the morning.
We also have the same thing in JSON format: https://curation.pombase.org/dumps/latest_build/misc/allele_summaries.json
OK, are these linked from our data downloads section. Maybe we should create. new section. Training datasets for AI/ML/textmining (it could include most datasets but I could add text describing how each file could be used)
OK, are these linked from our data downloads section.
We haven't got a link. We don't have many links to individual files from the downloads page. Mostly links to directories.
Maybe we should create. new section. Training datasets for AI/ML/textmining
We should include our new directory: https://www.pombase.org/public_releases/pombase-2024-06-01/training_data_for_ML_and_AI/ once it's available.
The all_alleles.tsv
file is (or will be) available as part of our new release directories:
https://www.pombase.org/public_releases/pombase-2024-06-01/phenotypes_and_genotypes/
although I'd like a better name for the file.
We should probably link to the alleles file in the new release directory structure.
For the ML data we can put a link on the website once we have consolidated the comments file a bit and included that.
Maybe we could include the alleles file in the phenotypes director and call it phenotypes_alleles or similar?
Maybe we could include the alleles file in the phenotypes director and call it phenotypes_alleles or similar?
OK, I've renamed it to phenotype_alleles.tsv
https://www.pombase.org/public_releases/pombase-2024-06-01/phenotypes_and_genotypes/
Sorry about that, I meant the directory name, but we already changed that to phenotypes _and_genetypes, which is better. We should keep this file name as "alleles.tsv" , I think.
We should keep this file name as "alleles.tsv" , I think.
OK. I've made that change.
We provide a lexica of gene names in data downloads, but we don't have a list of all alleles
probably
gene allele allele-synonyms description would be useful, since not all alleles match the primary name