Closed theq629 closed 10 years ago
I like it. We should do it, but keep some support for aliasing (when needed) for some future use.
This wasn't required for either avherald or the general refactoring, so I am removing it from those milestones and will do it after the domains merge.
Done in the d3andfieldnamecleanup branch.
I added the organization and airport fields to the changes. Airport is only for avherald, the others all apply to both current domains. The changes are thus as follows:
old name | new name |
---|---|
dbid | id |
eventRoot | predicate |
locationText | location |
currentCountryText | currentcountry |
personText | person |
categoryText | category |
organizationText | organization |
airportText | airport |
The field alias system is unchanged but there are currently no alias set for either domain.
This touches many lines in some files of the shared backend code and the domain-specific code for both domains. It does not touch the shared frontend code. It does not touch the data preparation layer or the Json data format.
There seems to be a problem with avherald under these new settings. for instance have a look at the current lensingaviation site under the following avherald_config_defaults.js
facets = {
"role": "Role",
"event": "Predicate",
"organization": "Organization",
"location": "Location",
"category": "Category"
};
It looks like it reports a backend error for the empty query but not after adding a constraint. I wonder if maybe the backend cache settings still have it using non-existent fields. Do you have the backend error message? Is there any chance you have old config files that use the old names?
do you mean old config files for the backend?
the avherald backend config looks like this:
def get_facet_field_values(event): """ Get values for the extra keyword field values. """ locationLocation = set(v['title'] for v in event['locations'].itervalues()) if 'locations' in event else set() wikiInfoLocation = set(v['title'] for v in event['wiki_info'].itervalues() if 'latitude' in v and 'longitude' in v) if 'wiki_info' in event else set() airport = set(v['title'] for v in event['airport'].itervalues()) if 'airport' in event else set() values = { 'title': event['title'], 'url': event['url'], 'sentence': event['sentence']['text'], 'sentenceSpan': [str(i) for i in event['sentence']['span']], 'event': event['event'][1], 'eventSpan': [str(i) for i in event['event'][0]], 'descriptionReplacements': format_replacement(event['wiki_info']), 'location': locationLocation | wikiInfoLocation | airport, 'airport': airport, 'currentcountry': [v['country'] for (k, v) in event['locations'].iteritems() if 'country' in v] if 'locations' in event else [], 'organization': [v['title'] for v in event['organization'].itervalues()] if 'organization' in event else [], 'person': [v['title'] for v in event['person'].itervalues()] if 'person' in event else [], 'category': [c for v in (event['wiki_info'].itervalues() if 'wiki_info' else []) if 'category' in v for c in v['category']] } return values
and
facet_field_names = ['title', 'url', 'sentence', 'sentenceSpan', 'event', 'eventSpan', 'descriptionReplacements', 'location', 'airport', 'currentcountry', 'organization', 'person', 'category'] description_field_names = ['title', 'url', 'sentence', 'sentenceSpan', 'description', 'descriptionReplacements', 'sentence', 'id', 'predicate', 'event', 'eventSpan', 'year']
I'm especially thinking of the backend instance settings (that might set fields_to_prime).
Also, what's the data you are running on right now? I'll try to replicate tomorrow.
/cs/natlang-projects/users/maryam/avherald/fullData.json
I managed to roughly replicate this once, at which point it appeared that the backend was not printing any error message (which should be impossible in this situation). Since then then I have been unable to replicate.
Can you please check the stderr output of the backend on the live site and see if there is any error report?
Maybe it's an error we haven't caught before? Perhaps due to the large size of the avherald data. I'm going to restart the server and see what happens.
after a clean restart, it seems to be working fine. let's close it for now and see if it happens again.
Ok, I'm also thinking it's some general backend bug that we just haven't caught before.
By the way, the "event" and "predicate" fields are different, so make sure you have the facet on the one you want.
I think the live site has the right version. But have a look and let me know if otherwise.
And I pushed a change to avherald_config_defaults.js to reflect the live site. Also to wikipediahistory previously.
Looks right to me now.
I'd like to make the following changes to the field names in the index:
This will replace the field name aliasing for text searches that is currently in use.
The -Text suffixes on the keyword fields is because in the SimpleDB version we were keeping the Json data as text in a field with the base name (eg 'person') and extracting the titles to the suffixed field (eg 'personText'). In the Whoosh version we don't keep the Json data (see issue #57), so we can get rid of the suffixes and avoid aliasing.
I'm making an issue for this since it will touch the frontend code as well as the backend.