stuckyb / ontopilot

15 stars 2 forks source link

inferred type exclusions #69

Closed stuckyb closed 7 years ago

stuckyb commented 7 years ago

OntoPilot should be able to exclude one or more user-specified types from the final set of inferred class assertions (type assertions) that are generated when reasoning on an ontology/data set. This could save a lot of space for large data files.

stuckyb commented 7 years ago

@jdeck88, I'm working on the this issue now. Do you have some current test data somewhere I could work on (i.e., some data that are typical of what the ingest pipeline is generating)? I need the triplified data before they are run through the reasoner.

stuckyb commented 7 years ago

@jdeck88 -- To explain a bit more, I could use the test data in the ppo_pre_reasoner repository, but I don't think they were updated since we made all of the changes in Boulder.

jdeck88 commented 7 years ago

Here you go... this one only has two records.. let me know if you need a test dataset w/ more rows.

On Mon, Apr 24, 2017 at 1:57 PM, stuckyb notifications@github.com wrote:

@jdeck88 https://github.com/jdeck88 -- To explain a bit more, I could use the test data in the ppo_pre_reasoner repository, but I don't think they were updated since we made all of the changes in Boulder.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stuckyb/ontopilot/issues/69#issuecomment-296820288, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGdxRtR96lG448mNfnN2_fsVh-jU4E_ks5rzQzTgaJpZM4NGPQX .

-- John Deck (541) 914-4739

stuckyb commented 7 years ago

Thanks John. And yes, a test dataset with more rows would be awesome.
Something big so I can test size reduction across a bunch of data points. I can't remember if you said you're breaking things up into 5,000 or 50,000 record chunks, but if you could add one of those, that would be great.

On 04/24/2017 05:43 PM, John Deck wrote:

Here you go... this one only has two records.. let me know if you need a test dataset w/ more rows.

On Mon, Apr 24, 2017 at 1:57 PM, stuckyb notifications@github.com wrote:

@jdeck88 https://github.com/jdeck88 -- To explain a bit more, I could use the test data in the ppo_pre_reasoner repository, but I don't think they were updated since we made all of the changes in Boulder.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stuckyb/ontopilot/issues/69#issuecomment-296820288, or mute the thread

https://github.com/notifications/unsubscribe-auth/ABGdxRtR96lG448mNfnN2_fsVh-jU4E_ks5rzQzTgaJpZM4NGPQX .

-- John Deck (541) 914-4739

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stuckyb/ontopilot/issues/69#issuecomment-296830902, or mute the thread https://github.com/notifications/unsubscribe-auth/ADKD73VY0Jscfk27FlaoDUmImH2AN8bHks5rzReGgaJpZM4NGPQX.

stuckyb commented 7 years ago

Wait -- where did you put the test data? In the ppo_pre_reasoner repository?

jdeck88 commented 7 years ago

OK-- i attached the previous to the email thread... not sure the email/github functions supports attachment. Here is a response w/ file actually attached (and 50,000 incoming records) test_50000.csv.ttl.gz

stuckyb commented 7 years ago

Excellent. Thanks!

stuckyb commented 7 years ago

This is now working in the phenology data ingest pipeline, so this feature is done.