Closed taltman closed 7 years ago
Generally we try to tell people to name their initial files intelligently, because in the downstream analysis they are going to have rely on their namespace for integration. All I can think here is to put some recommended naming conventions for samples in the documentation because I don't think its a good idea analytically to allow the code to just rename samples sequentially.
I agree that it is often tricky to automagically rename things, but two points:
For point 1, MP should be able to check to see if a given OrgID is in use (locally), and warn a user that if they process the current sample, it will have a conflicting name. For point2, There should be a field in the GUI (and perhaps a line in the parameter file for the CLI) to specify the desired OrgID, full name, and abbreviated name.
point 1 is addressed in teh upcoming release. point 2 will have to be tabled for a future version!
Choosing the OrgID, the full name, and the abbreviated name of the ePGDB is important, because this is hard to change later on (especially for the OrgID). The user should be able to use the GUI to select their choice. Especially when they choose multiple sequence files, and thus it is not obvious what the base name should be.
In my case, my assembler calls every output "scaffolds.fasta", so MP is trying to call several ePGDBs "scaffolds" as the name and OrgID. MP should be smart enough to know that several ePGDBs cannot all have the same OrgID.