statonlab / tripal_hq

provides a user and administrative dashboard for Chado content creation
GNU General Public License v3.0
2 stars 0 forks source link

Bulk loading? #114

Open spficklin opened 5 years ago

spficklin commented 5 years ago

Is it possible to use Tripal HQ to triage large bulk imports of data? For example, say I wanted to load data via the Tripal Analysis Blast module via web services (that functionality doesn't quite exist yet). Would it be possible to have all of that data triaged with this module before getting loaded?

bradfordcondon commented 5 years ago

all of the metadata could get triaged. You would create the draft analysis with all of its fields filled out, and upload the file.

Right now there is no automation hooked up between the user creating the analysis and uploading the file, and the admin running the importer with said file. Thats intentional, we thought the admin would want to vet the input file first and run it themselves.

So, theres no draft features etc created. It's 1) user requests an analysis to be created, 2) admin approves, analysis is created automatically by this module, 3) admin manually imports.

laceysanderson commented 5 years ago

KnowPulse also needs functionality to triage bulk loading -specifically TripalImporter jobs. For example, with the submission of genetic maps and QTL not all the needed metadata can/should be associated with a single tripal content page. In these cases, the TripalImporter provides a convenient common place for all metadata and the file.

I can see a design which shows the TripalImporter form to users, validates based on the TripalImporter, saves the submitted file/metadata for review by an administrator and maybe even handles submission of the TripalImporter job once approved.

I have some developer hours to develop such a module and a relatively tight timeline (essentially ASAP 😝 ) and am wondering if you think I should make a stand-alone complement to Tripal HQ or if you would recommend closer collaboration? Maybe adding data loaders along side content in the add new content listing and including them in the submission listings but store them in separate tables? I'd love to hear your thoughts! :-)

mestato commented 5 years ago

While the initial HQ design made sense (focus on metadata), I can see how HQ should be coupled somehow to the actual data upload to be maximally useful. IMO, overlapping functionality in a separate module would be a shame.

@laceysanderson could you elaborate a bit more on the last idea you put in your comment, with adding data loaders but using separate tables? I'm not sure what the separate tables would be.

It does seem like TripalImporter would be involved in any solution. I'll try to learn more and maybe we could brainstorm how this could work.

laceysanderson commented 5 years ago

@laceysanderson could you elaborate a bit more on the last idea you put in your comment, with adding data loaders but using separate tables? I'm not sure what the separate tables would be.

Maybe adding data loaders along side content in the add new content listing and including them in the submission listings but store them in separate tables?

Definitely! I put together some quick screenshots to make it more clear. On the user submit content page:

Screen Shot 2019-10-10 at 2 16 32 PM

The concept here is that just as the user could create content currently through HQ, this would allow them to submit a TripalImporter form in the same way. Their submission details would be stored in a drupal table until approved. Once the job is approved the importer would be run as a Tripal Job.

My thought was similar for the Administrative dashboard:

Screen Shot 2019-10-10 at 2 23 21 PM

This would allow for seamless integration with Tripal HQ and ensure administrators could go to a single dashboard to approve all content whether it's being submitted through Tripal Content Forms or TripalImporters.

As for how to accomplish this, I'm happy to start development on an optional submodule for Tripal HQ to handle this if it fits in well with your goals for the module. I hope that by keeping it optional, it ensures Tripal HQ still works for its original audience but also expands the use case to bulk data.

If you want, I can start development of the submodule in a separate repository making it dependant on Tripal HQ so I can use the API and assume pages already exist. Then I would just make PRs to this module where I need hooks or api functions developed. If in the end it feels synergistic, I'd be happy to contribute the module back :-)

laceysanderson commented 5 years ago

I have submitted a PR to allow me to hook into the user and administrative dashboards. This is needed whether we go with a separate extension module or an embedded submodule and ensures there is very little code duplication.

You can see my progress on the extension module here: https://github.com/UofS-Pulse-Binfo/tripal_hq_imports. I took my suggested route in the interests of saving time as I do need this functionality right away. This route still supports both options (1. we move my module into Tripal HQ core as a submodule, 2. we keep it separate as a well integrated extension module). I will try to keep the README up to date with current functionality so you can see my progress and would love any input you have on design, etc. :-)

spficklin commented 5 years ago

@laceysanderson will this extension work with the Blast/InterPro modules? If not, can it be extended to do so?

Also, just curious. Should your extension be part of Tripal_HQ instead of an extension module of it? That creates a multi-level deep layer of dependencies.

laceysanderson commented 5 years ago

I just took a quick look at the importers for the Blast/InterPro modules and yes, it should! If you provide some test files, I would love to test it.

As for whether my module should live within or as an extension: I would love for my module to live within this repository as an optional sub-module of Tripal HQ and have developed it with that in mind. I'm just waiting to see if the finished module feels like a good fit to the repository owners.

bradfordcondon commented 5 years ago

ill chime in that we already have an HQ submodule for permissions so adding another is no problem i would think