hallamlab / metapathways2

MetaPathways v2.0: A master-worker model for environmental Pathway/Genome Database construction on grids and clouds
http://hallam.microbiology.ubc.ca/MetaPathways/
33 stars 14 forks source link

MP provides no way to specify the OrgID for the ePGDB #54

Closed taltman closed 7 years ago

taltman commented 9 years ago

Choosing the OrgID, the full name, and the abbreviated name of the ePGDB is important, because this is hard to change later on (especially for the OrgID). The user should be able to use the GUI to select their choice. Especially when they choose multiple sequence files, and thus it is not obvious what the base name should be.

In my case, my assembler calls every output "scaffolds.fasta", so MP is trying to call several ePGDBs "scaffolds" as the name and OrgID. MP should be smart enough to know that several ePGDBs cannot all have the same OrgID.

nielshanson commented 9 years ago

Generally we try to tell people to name their initial files intelligently, because in the downstream analysis they are going to have rely on their namespace for integration. All I can think here is to put some recommended naming conventions for samples in the documentation because I don't think its a good idea analytically to allow the code to just rename samples sequentially.

taltman commented 9 years ago

I agree that it is often tricky to automagically rename things, but two points:

  1. MP should never set up the data so that it will have a conflict with a previous run
  2. MP should provide instructions on how to specify the OrgID and names (full & abbreviated) in the MP input

For point 1, MP should be able to check to see if a given OrgID is in use (locally), and warn a user that if they process the current sample, it will have a conflicting name. For point2, There should be a field in the GUI (and perhaps a line in the parameter file for the CLI) to specify the desired OrgID, full name, and abbreviated name.

hallamlab commented 7 years ago

point 1 is addressed in teh upcoming release. point 2 will have to be tabled for a future version!