Open hexylena opened 8 years ago
Do you have plans on the data loading into chado? I mean for data that don't come from apollo as for them there is now the galaxy/apollo stuff. There are the gmod perl scripts, but I must admit I'm not very fond of them!
My plans for data loading are to make those perl scripts available as Galaxy tools. Any scripts in particular?
I'm waiting to find a bit of time to do the galaxy-side portion first (we just need a generic, user-configurable key-value store with predictable parameter names so tools can say "hey I can accept values for apollo.username
, apollo.host
, apollo.password
" and then let the user configure those and re-use across compatible tools) before starting on wrapping those scripts.
Also, any ideas which scripts/what sort of data you will wish to load?
The data we usually load is:
I had some troubles with gmod_bulk_load_gff3.pl especially when loading annotation (some features duplicated for unknown reasons, automatically created peptide features with random names, ...), and I ended up writing little scripts to fix the data once loaded, rather than fixing it upstream (bad boy!)...
Ok how is this:
data | loader |
---|---|
genome | apollo |
annotations | apollo, bulk_load |
blast results | bulk_load (blast2gapped gff3 might be useful, would love to see your custom blast GFF3 though, everyone has interesting approaches to this ;)) |
interpro | bulk_load |
blast2go | ??? (is this something we can do in bulk load? Would you be interested in sharing your script? Maybe it is generally applicable and we can make more useful tools for people? Hmmm.) |
ontology | I think we'll have to wrap the xort/ontology loading equipment. I'm a bit concerned about doing it though, need to think a bit more on how to deal with these DBs. |
And yes, similar experiences with bulk_load doing strange things...very similar.
Yes, it could be something like this For blast2go, I have to check, but IIRC the loading method should be reworked because it doesn't work well with multiple analyses. For ontology, well it's orthology in fact ;) ie gene similarities between different species. This script is a little bit too simple, I think we would move to something more like https://github.com/legumeinfo/tripal_phylotree and http://gmod.org/wiki/Chado_Phylogeny_Module to store the data Anyway, no problem to share these scripts of course, I just need to find some time to a have a look at them first
blast2go
ok, sounds good.
orthology
ahhh interesting. I considered that it might be a typo, since chado people seem so often more concerned with such. Interesting. :)