sfu-natlang / lensingwikipedia

Lensing Wikipedia is an interface to visually browse through human history as represented in Wikipedia. This the source code that runs the website:
http://lensingwikipedia.cs.sfu.ca
Other
11 stars 4 forks source link

Renaming internal fields #81

Closed theq629 closed 10 years ago

theq629 commented 10 years ago

I'd like to make the following changes to the field names in the index:

old namenew name
dbidid
eventRootpredicate
locationTextlocation
currentCountryTextcurrentcountry
personTextperson
categoryTextcategory

This will replace the field name aliasing for text searches that is currently in use.

The -Text suffixes on the keyword fields is because in the SimpleDB version we were keeping the Json data as text in a field with the base name (eg 'person') and extracting the titles to the suffixed field (eg 'personText'). In the Whoosh version we don't keep the Json data (see issue #57), so we can get rid of the suffixes and avoid aliasing.

I'm making an issue for this since it will touch the frontend code as well as the backend.

anoopsarkar commented 10 years ago

I like it. We should do it, but keep some support for aliasing (when needed) for some future use.

theq629 commented 10 years ago

This wasn't required for either avherald or the general refactoring, so I am removing it from those milestones and will do it after the domains merge.

theq629 commented 10 years ago

Done in the d3andfieldnamecleanup branch.

I added the organization and airport fields to the changes. Airport is only for avherald, the others all apply to both current domains. The changes are thus as follows:

old namenew name
dbidid
eventRootpredicate
locationTextlocation
currentCountryTextcurrentcountry
personTextperson
categoryTextcategory
organizationTextorganization
airportTextairport

The field alias system is unchanged but there are currently no alias set for either domain.

This touches many lines in some files of the shared backend code and the domain-specific code for both domains. It does not touch the shared frontend code. It does not touch the data preparation layer or the Json data format.

anoopsarkar commented 10 years ago

There seems to be a problem with avherald under these new settings. for instance have a look at the current lensingaviation site under the following avherald_config_defaults.js

     facets = {
    "role": "Role",
    "event": "Predicate",
    "organization": "Organization",
    "location": "Location",
    "category": "Category"
    };
theq629 commented 10 years ago

It looks like it reports a backend error for the empty query but not after adding a constraint. I wonder if maybe the backend cache settings still have it using non-existent fields. Do you have the backend error message? Is there any chance you have old config files that use the old names?

anoopsarkar commented 10 years ago

do you mean old config files for the backend?

the avherald backend config looks like this:

def get_facet_field_values(event):
  """
  Get values for the extra keyword field values.
  """
  locationLocation = set(v['title'] for v in event['locations'].itervalues()) if 'locations' in event else set()
  wikiInfoLocation = set(v['title'] for v in event['wiki_info'].itervalues() if 'latitude' in v and 'longitude' in v) if 'wiki_info' in event else set()
  airport = set(v['title'] for v in event['airport'].itervalues()) if 'airport' in event else set()
  values = {
    'title': event['title'],
    'url': event['url'],
    'sentence': event['sentence']['text'],
    'sentenceSpan': [str(i) for i in event['sentence']['span']],
    'event': event['event'][1],
    'eventSpan': [str(i) for i in event['event'][0]],
    'descriptionReplacements': format_replacement(event['wiki_info']),
    'location': locationLocation | wikiInfoLocation | airport,
    'airport': airport,
    'currentcountry': [v['country'] for (k, v) in event['locations'].iteritems() if 'country' in v] if 'locations' in event else [],
    'organization': [v['title'] for v in event['organization'].itervalues()] if 'organization' in event else [],
    'person': [v['title'] for v in event['person'].itervalues()] if 'person' in event else [],
    'category': [c for v in (event['wiki_info'].itervalues() if 'wiki_info' else []) if 'category' in v for c in v['category']]
  }
  return values

and

facet_field_names = ['title', 'url', 'sentence', 'sentenceSpan', 'event', 'eventSpan', 'descriptionReplacements', 'location', 'airport', 'currentcountry', 'organization', 'person', 'category']
description_field_names = ['title', 'url', 'sentence', 'sentenceSpan', 'description', 'descriptionReplacements', 'sentence', 'id', 'predicate', 'event', 'eventSpan', 'year']
theq629 commented 10 years ago

I'm especially thinking of the backend instance settings (that might set fields_to_prime).

Also, what's the data you are running on right now? I'll try to replicate tomorrow.

anoopsarkar commented 10 years ago

/cs/natlang-projects/users/maryam/avherald/fullData.json

theq629 commented 10 years ago

I managed to roughly replicate this once, at which point it appeared that the backend was not printing any error message (which should be impossible in this situation). Since then then I have been unable to replicate.

Can you please check the stderr output of the backend on the live site and see if there is any error report?

anoopsarkar commented 10 years ago

Maybe it's an error we haven't caught before? Perhaps due to the large size of the avherald data. I'm going to restart the server and see what happens.

anoopsarkar commented 10 years ago

after a clean restart, it seems to be working fine. let's close it for now and see if it happens again.

theq629 commented 10 years ago

Ok, I'm also thinking it's some general backend bug that we just haven't caught before.

theq629 commented 10 years ago

By the way, the "event" and "predicate" fields are different, so make sure you have the facet on the one you want.

anoopsarkar commented 10 years ago

I think the live site has the right version. But have a look and let me know if otherwise.

anoopsarkar commented 10 years ago

And I pushed a change to avherald_config_defaults.js to reflect the live site. Also to wikipediahistory previously.

theq629 commented 10 years ago

Looks right to me now.