Open brent-hartwig opened 1 month ago
@azaroth42, would taking the triples out of the documents hinder another group's reuse or consumption of the data? The triples presently do not leave MarkLogic but that could change. My concern is having to reproduce the logic. Then again, the TDEs would be available too.
TF 10/22: Related to ML conference to use TDE's to create triples. Python code would need to be investigated to find all if statements. If statements would then be added to the xpath, which then looks for the triple.
What is the benefit of doing it this way?
Unsure how to generate a list while the data pipeline is running.
Should we defer this conversation until the data scientist is hired? For capacity reasons? Unsure.
@azaroth42 and @clarkepeterf, this could be a first step in taking the triples out of the data and using TDEs to populate the triple store upon load.
I created the ticket in this repo as it is where the logic is today, and Peter's idea to generate the list while the data pipeline is running.
Given the anticipated number of XPaths and advise to keep TDEs small, consider the possibility of using the output of this ticket as input to initially generating the TDEs.
I defer to you to figure out if/when to implement.