srophe / srophe-eXist-app

DEPRECATED eXist code for Syriaca.org: The Syriac Reference Portal
GNU General Public License v3.0
10 stars 12 forks source link

Data counts in SPEAR #1111

Closed dlschwartz closed 4 years ago

dlschwartz commented 5 years ago

@wsalesky There are a number of things that don't seem to be displaying correctly in the counts, or maybe they just aren't showing up in the display at all. I'm not sure how recent this is. I've thought that some things might be missing but I wasn't sure if it was a data problem or a bug. Now that this data is mostly cleaned it seems pretty clear it's either a bug or something I don't understand. I'm worried that this might take some effort to figure out. I think the first thing to check actually is that the re-running of the RDF worked. I'm not exactly sure how to check that.

Counts in Oxygen Lives 1074 divs/factoids 649 listPersons/person factoids 340 listEvents/event factoids 85 listRelations/relation factoids

Letters 1202 divs/factoids 712 listPersons/person factoids 159 listEvents/event factoids 331 listRelations/relation factoids

Chronicls 551divs/factoids 355 listPersons/person factoids 171 listEvents/event factoids 25 listRelations (not nested listEvents)/relation factoids

Compare with all

persons

events

relations

wsalesky commented 5 years ago

@dlschwartz I will take a look. It looks like a SPARQL problem to me. Maybe the wrong things are getting counted.

dlschwartz commented 5 years ago

Maybe, but there is also legacy data being visualized. It's complicated so I'll make another issue as I'm not sure how this relates to what I've done in this issue.

dlschwartz commented 5 years ago

@wsalesky Things are looking pretty good. I can't see counts though because the sources aren't showing up: sourcesspear

Also, on the browse page, the persons and places are being moved to the end of the sentence: prose

However, the prose comes out fine on the factoid page: factoidpage

wsalesky commented 5 years ago

@dlschwartz Still a work in progress, I ran into a nasty bug last night so I didn't get everything worked out as I wanted.

dlschwartz commented 5 years ago

@wsalesky Very good. I wasn't sure where things stood but thought I'd offer some feedback. Thanks!

wsalesky commented 5 years ago

@dlschwartz As an update, I have rerun the data but am still getting bad counts, and I can not figure out what is causing the malformed labels, they show up fine when I run them one at a time. So, still debugging. Sorry it is taking so long.

Also, I think a future development will be to split the facets out to make them run faster (each facet being a single sparql request, right now they are all submitted together, which is slow.)

dlschwartz commented 5 years ago

@wsalesky Not slow at all Winona. I'm worried I'm taking you away from your kids on the last week of their summer break. We'll get this sorted. Thanks Winona!

wsalesky commented 5 years ago

@dlschwartz I think I now have the correct data. Just need to troubleshoot the SPARQL queries.

dlschwartz commented 5 years ago

@wsalesky Thanks! Things are looking great.

wsalesky commented 5 years ago

@dlschwartz I think it is fixed. However, the 'Persons' tab is a little odd. The facet counts are based on the number of person factoids, but the results just show the Person, so the counts look off if there are multiple matching factoids about a particular person. Do we want to show the factoids on this page instead of just the person?

Also, tomorrow I think I will try to speed up the facets I think if I make a single request for each facet that should do the trick, but it will take a little refactoring. I will do the work locally so I do not break anything on dev.

Let me know I missed anything!

dlschwartz commented 5 years ago

@wsalesky This is an interesting problem. I'm not sure that the list of individual factoids is very useful to anyone. Dave tends to want this kind of thing though. I tend not to want it.

Options

  1. Leave things the way they are. I don't think it's really a problem to display the results grouped in this way.

  2. We could display the count of unique persons about whom we have person factoids instead of the count of factoids? That would be a count of unique values of //div/listPerson/person/persName/@ref. Here Dave does have a point about raw data vs. curated data.

  3. We could work out some way to display in the browse results the person factoids grouped by the person they are about. If we were to go this way, I'm not sure we should do that right now.

I think that in the short run the options are 1 or 2. Perhaps we just leave the status quo and return to this issue later. Let me know if you have any additional thoughts on this.

wsalesky commented 5 years ago

@dlschwartz I need to do some more SPARQL experiments, I was aiming for option 2 but got some very odd results. I think step one is faster facets, step 2 will be to address this. I will probably hold off until next week unless you feel it is a real problem for your paper.

dlschwartz commented 5 years ago

@wsalesky No problem. Holding off is fine.

wsalesky commented 4 years ago

@dlschwartz closing this as a stale issue.