Update TDB binary data - Githubissues

vivo-community / scholars-discovery

BSD 3-Clause "New" or "Revised" License

2 stars 6 forks source link

Update TDB binary data #234

Closed nymbyl closed 4 years ago

nymbyl commented 4 years ago

NOTE: several problems emerged with trying to update the binary TDB data (via merge):

Binary 'diff' in Git is (not surprisingly) not workable
So it's a 'merge conflict' that has to be manually resolved (I think just by 'add'ing)
Typically I'd do that locally, but since this requires 'Review' I'm guessing I can't do that (not really sure though)
In Git it seems to be listing things that are already in the code as changes (don't know why)

Anyway, because of all that - I'm marking this as a draft. Maybe somebody has more experience with these things to know best way forward. It's really all just to delete 4 erroneous grants

ghost commented 4 years ago

I am not sure what to make of the diff. It seems OIT-ADS-Web:master branch is behind scholars-discovery:master. Could you try merging scholars-discovery:master into OIT-ADS-Web:master? There are 9 commits and only one is for updating the TDB files.

nymbyl commented 4 years ago

Well - after a discussion with Richard this morning - his recommendation was to not merge in scholars-discovery master for some reason. Possibly because the merge conflicts will go back to the old tdb files? Not sure. I will have to check with him

ghost commented 4 years ago

To get around that. Copy the triplestore directory to somewhere else, merge upstream master, delete triplestore directory, copy triplestore directory back.

ghost commented 4 years ago

This diff seems to be including the original commits to add the tdb. That may be why there is no diff on the binaries. I don't expect a file diff, but do expect a diff of file size.

nymbyl commented 4 years ago

As a sidenote - I'm wondering if it'd be worth the effort to make a *.ttl file harvester? The process of Jena reading one of those files and converting it to TDB is pretty trivial (if it would even need to be converted). And then the sample data could be all text

ghost commented 4 years ago

That is a good idea. I assume it would be fairly straight forward to create a harvester to consume a directory of ttl files.

nymbyl commented 4 years ago

I think so - I mean I made one a while ago - at least converting ttl to TDB was probably less than 10 lines of code. I'm thinking it's possible though to not even need that conversion - and just query the dataset

nymbyl commented 4 years ago

It is easy to write a ttl file/directory harvester - I have some code working already. I'm thinking this is the better route. It's a little slow for the openvivo data - but for the test data I think it'll be fine

ghost commented 4 years ago

I like the idea. Additional harvester and git friendly default triples.

nymbyl commented 4 years ago

Sounds like a plan - I'll finish that up then. In the perfect world it would read in a lot of different formats (n3, trig, rdf/xml) - but I'm not sure I can go that far this round - especially since it may not be practical for production data anyway