Closed matentzn closed 1 year ago
The argument for -t
has to be the name of a template in the templates folder.
We should make it such that it can be any of:
For now I would suggest making PRs and we will err on the side of over-inclusion of different people's schemas in the main repo (and people can always make long running forks if they don't want to share). As we may need to refactor a bit having them all mono style is probably best for now. But there will come a point where this won't scale... not there yet though!
should i create a linkml enhancement ticket (in the linkml repo) for your "we should make it" list?
this would be an ontogpt piece of functionality
@matentzn I think I was able to get a custom one up and running in my fork. The specific yaml file can be found here. I did encounter a number of (undocumented?) challenges getting it up and running though, which I can recount here.
To the maintainers: If this is the incorrect place to document these, let me know and I can create a new issue, but it seems topically related to the implementation of custom LinkML models.
make
and the makefile and the use of datamodel-codegen
has a tendency to mangle capitalization conventions of your field-names in the yaml file. This results in errors when the code attempts to populate the corresponding fields and/or validate the post-GPT query structure. I haven't dug too deeply into this, but the documentation of datamodel-codegen makes reference to snake-casing options, and potentially other settings, so there may be a parameter to avoid this behavior.Makefile
documentation is (was?) inaccurate.--cache-db
option for extract
is extremely useful for debugging and diagnosing issues arising during the implementation of the custom workflow. I think it's definitely something worth mentioning in a more prominent fashion in the documentation / readme.enum
fields doesn't quite seem to work as viably as one might hope. It appears that this is because the descriptions that one may provide in the enum component of the yaml file don't actually make it to the compiled .py file and aren't actually seen by the GPT model in the resultant prompt, though the main field descriptions are. This results in responses that are in accordance with the main field description, but which violate the enum
requirement, and so the resultant structure fails pydantic
validation. Realistically though, this may be a boon though, considering the token limit for GPT prompt and response (4097).max_tokens
which can cause issues with larger prompt models. Perhaps there's a way to enable a pass-through parameter or dynamically calculate the permissible remainder for the response?Makefile
(as I don't think this is included in the pip install ontogpt
version?). However, making the ontogpt extract -t [model] -i [targetText]
call (probably?) uses the ontogpt
instance that is associated with your python environment (which may not be the cloned repo?). It's quite likely that this approach to using ontogpt is not the recommended method (e.g. there's a better way to associate the custom fork of ontogpt with the python environment), but my solution was then to copy over the environment's site-packages
directory.That's all that I can recall at the moment. Thank you to the maintainers for this powerful new workflow! Also, let me know if I should create a separate thread for this post.
Not sure if this is the right place for this question -- is there a way to integrate a custom LinkML schema with the pip-installed version of the package, as opposed to needing to fork the repo?
Hi @serenalotreck, this seems like a question that is not that closely related to this issue. Can you open a new issue, or try asking on the LinkML community Slack?
This question is still related, but I'll transfer it to its own issue so I ensure it gets into the documentation.
I tried
but got: