legumeinfo / microservices

A collection of microservices developed and maintained by the Legume Information System
https://legumeinfo.org/
Apache License 2.0
3 stars 2 forks source link

Implement GFF loading script #10

Closed alancleary closed 3 years ago

alancleary commented 4 years ago

As part of the move away from Chado via microservices we need to implement a script that loads data into the GCV Redis database directly from GFF files. There is already a script that loads data from Chado. This will be maintained for migration and adoptability purposes. As such, I'll isolate the part of that script that actually puts data into Redis into its own module so both scripts can use the code. This will make updating the loading scripts when the Redis schema changes a matter of changing one file.

The GFF loader should be written in Python, use spaces instead of tabs, and conform to the PEP8 code style guide. Also, use of the loader should be documented in the README.md file of the scripts directory.

This issue is blocked by issue #9 and shouldn't be started until #9 is closed.

sammyjava commented 3 years ago

I don't think there's any safety issue with using :memory:. Unless you can document one to convince me otherwise. It's working fine with the full LIS GFF on my laptop. We're not talking about a lot of RAM here. But feel free to provide a concrete argument against it.

sammyjava commented 3 years ago

Closing this, I've comitted the version that loads a single gene GFF/chromosome GFF/gene family GFA set into a clean Redis. I'll put the append mode into a separate issue, since it applies to both gff_to_redisearch.py and chado_to_redisearch.py.