Remaining trait-insertion API issues (separated out of issue #124)

gsrohde commented 8 years ago

[ ] 1. Add GitBook documentation of JSON and XML format POST requests (#315)
[x] 2. Consider making key optional (can use curl -u option instead)
[ ] 3. Check error response to SQL validation errors
[ ] 4. Add more testing
[x] 5. Check that CSV API accepts various kinds of line endings
[x] 6. Check that POST APIs handle character encodings properly (#316)
[x] 7. Update documentation example for posting data to specify adding the -H option to curl.
[x] 8. Add support for an "entity" column in uploaded CSV files. (This is issue https://github.com/PecanProject/bety/issues/437.)
[x] 9. Ensure treatments are consistent with citation.
[x] 10. Address issue https://github.com/PecanProject/bety/issues/430.
[ ] 11. Address issue https://github.com/PecanProject/bety/issues/449.

ghost commented 8 years ago

@dlebauer - is this a priority for the V0 release?

max-zilla commented 8 years ago

From @gsrohde in https://github.com/terraref/computing-pipeline/issues/147:

I think that the API rolls everything back if there is a row it can't insert. I have to double-check this. In other words, if you tried to do the API call without all the metadata being there, no harm would be done.

I still have to deploy some last few changes to the insertion API having to do with the traits issue we discussed yesterday. I plan to do a release today (unlikely), tomorrow (possibly) or early next week, after which we should be good to go.

The PlantCV extractor is running - when these changes are released, I will update the PlantCV extractor to point to

terraref.ncsa.illinois.edu/bety

...with an API key I generate, and we are good to go.

gsrohde commented 8 years ago

@max-zilla I think I'm done with all the high-priority items in this issue (except for updating the documentation, which isn't quite done), but I won't do a new release until sometime next week. (Precisely when depends on how much else I want to try to include.) In the meantime, you are welcome to pull the updates to your own BETYdb copy and re-test (if that's useful and feasible).

The main changes in how things work are as follows. If your CSV files always will have an entity column which is not blank in any row, then probably only points 1 and 3 will be of interest.

The treatment (if any) has to be consistent with the citation. That is, there must be a row in the citations_treatments table associating the treatment you name with the citation you select. (The treatment selected from the treatements table will be the one whose name not only matches the value in the CSV-file treatment column but is also associated with the citation matching the CSV-file citation column(s). A fortiori, the CSV file can't have a treatment column unless it has (a) citation column(s).
It used to be that even if the CSV file didn't have an entity column, an anonymous entity (one having the empty string in the name column) would be created for each CSV-file row, and each trait-table row created from the data in the CSV-file row would be associated with that newly-created anonymous entity. Now this only happens if there are at least two traits created per CSV file row. (I didn't see any use in creating an anonymous entity connected to only a single trait. It doesn't add any information.)
Entities are re-used if the value of the entity column in the CSV file matches the value in the name column of an entity in the entities table. The exception is that entities with blank names are never re-used. (Eventually the entities table will be constrained to require non-blank names to be unique (I hope).)
If your CSV file has an entity column but there is only one trait created per CSV row, then the entity column must be non-blank for each row of the CSV file. (I could change this so that the column value could sometimes be blank and either (i) no entity would be created for rows with blank entity column values, or (ii) an anonymous entity would be created if the entity column value were blank. Again, I don't see much use for option (ii) since I don't see any point to an anonymous entity connected to a single trait.)

max-zilla commented 8 years ago

I have updated the PlantCV extractor config with the terraref.ncsa.illinois.edu/bety instance and my API key there (account mburnet2). The Clowder upload process is into the first 2 weeks of September so we should start seeing some PlantCV extractions trigger over the weekend or very early next week.

ghost commented 8 years ago

@gsrohde - please update this issue

dlebauer commented 7 years ago

@gsrohde will you be able to finish this up by the end of May? If so, please change the milestone, if not, please convert to an epic and break into smaller pieces that you can complete. (same for #124)

terraref / computing-pipeline

Remaining trait-insertion API issues (separated out of issue #124) #172