Closed stoduk closed 8 years ago
With the fix in place, and testing with a new book (so we don't have saved alias data) - see that self._aliases "before" is empty, "after" is not, and yet we aren't ending up with the broken scenario above (we end up with BookParser._aliases = {}, rather than having a "": "some character" entry).
Book._parse_shelfari_data:
{1: {'label': 'Narrator', 'description': 'A witty, satirical voice who is Vonnegut himself.'}, 2: {'label': 'Billy Pilgrim', 'description': 'The main character of the story, a World War II vet who was a POW in the bombing of Dresden. He shifts through time, and at one point was abducted by aliens.'}, 3: {'label': 'Roland Weary', 'description': "Short and stocky antitank gunner during the war and Billy's comrade. He was obsessed with torture devices and gore, in his dying words he blamed Billy for his death."}, 4: {'label': 'Paul Lazzaro', 'description': "A vengeful fellow prisoner-of-war who blamed Billy for Weary's death who said he can have anyone killed for a thousand dollars and would have Billy killed."}, 5: {'label': 'Kilgore Trout', 'description': 'Science fiction writer who Billy meets in an alley way.'}, 6: {'label': 'Edgar Derby', 'description': "A prisoner of war with Billy, an older fellow who was a teacher, and wanted to come to war because he couldn't stand seeing all his students go off to war while he wasted away."}, 7: {'label': 'Howard W. Campbell, Jr.', 'description': 'A Nazi who was an American and wrote several training documents to help the Germans better understand their enemy. and tried to end the war.'}, 8: {'label': 'Valencia Merble', 'description': "Billy's wife, a rich fat woman."}, 9: {'label': 'Robert Pilgrim', 'description': "Billy's son who fought in the Green Berets during the Vietnam war."}, 10: {'label': 'Barbara Pilgrim', 'description': "Billy's daughter who thinks her father is crazy and treats him like a child. She is attractive with the exception of her legs which are shaped like those of an Edwardian grand piano."}, 11: {'label': 'Montana Wildhack', 'description': 'Famous movie actress who Billy is put on display with at the zoo on Tralfamadore'}, 12: {'label': 'Eliot Rosewater', 'description': "Billy's friend and fellow patient who introduces him to Kilgore Trout."}, 13: {'label': 'Bertram Copeland Rumfoord', 'description': 'An aging Harvard history professor, retired Air Force brigadier general, and millionaire, who shares a hospital room with Billy and asks Billy to tell him stories about his experience in the Dresden Bombing.'}, 14: {'label': "Mary O'Hare", 'description': "Bernard O'Hare's wife and one of the people to whom the book is dedicated."}, 15: {'label': 'Billy Pilgrim', 'description': 'A man who is symbolic of the absurdities of war. His nonsensical demeanor facilitates his journey through time and space, ultimately articulating an anti-war message.'}, 16: {'label': "Bernard V. O'Hare", 'description': "The narrator's former war comrade."}, 17: {'label': 'Tralfamadorians', 'description': 'Aliens shaped like toilet plungers, each with one hand containing an eye in its palm.'}, 18: {'label': 'Eddie D. Slovik', 'description': 'The only American soldier to be shot for cowardice since the Civil War.'}}
{19: {'label': 'Dresden, Germany', 'description': 'Billy Pilgrim as part of approximately one hundred American prisoners-of-war are transported there to serve as contract laborers. The city was the last city in Germany to be carpet bombed.'}, 20: {'label': 'Tralfamadore', 'description': "Billy Pilgrim was abducted by aliens, taken to the planet Tralfamadore, and displayed naked in a zoo. The Tralfamodorians are shaped similar to toilet plungers. The shaft, which is extremely flexible, is topped by a small hand with one green eye in its palm. This planet's inhabitants see in four dimensions rather than three."}, 21: {'label': 'Ilium, New York', 'description': 'Much of this novel when not set in Dresden is set in Ilium where Billy grew up and established his optometrist office.'}, 22: {'label': 'Slaughterhouse Five', 'description': 'A former slaughterhouse used during World War II in Dresden to house American prisoners-of-war.'}, 23: {'label': 'Salmon Roe', 'description': 'Salmon eggs, better known as red caviar.'}, 24: {'label': 'Thumbscrew', 'description': 'An instrument of torture consisting of a ring into which the thumb is inserted and a screw that is then tightened gradually, until the bones are shattered.'}}
self._aliases before: {}
self._aliases after: {'Tralfamadorians': [], 'Montana Wildhack': [], 'Kilgore Trout': [], 'Edgar Derby': [], 'Howard W. Campbell, Jr.': [], 'Eliot Rosewater': [], "Mary O'Hare": [], 'Robert Pilgrim': [], 'Bertram Copeland Rumfoord': [], 'Billy Pilgrim': [], 'Valencia Merble': [], 'Tralfamadore': [], 'Dresden, Germany': [], 'Slaughterhouse Five': [], 'Narrator': [], 'Paul Lazzaro': [], 'Eddie D. Slovik': [], "Bernard V. O'Hare": [], 'Thumbscrew': [], 'Ilium, New York': [], 'Salmon Roe': [], 'Roland Weary': [], 'Barbara Pilgrim': []}
BookParser: self._aliases: {}
Job: 6 Creating X-Ray Files finished
Starting job: Creating X-Ray Files
05-17-2016 16:59:59 Slaughterhouse Five - Kurt Vonnegut
05-17-2016 16:59:59 Getting ASIN...
05-17-2016 17:00:05 Getting shelfari url...
05-17-2016 17:00:41 Parsing shelfari data...
05-17-2016 17:00:43 Getting format specific data...
05-17-2016 17:00:43 Parsing book data...
05-17-2016 17:00:46 Creating x-ray...
05-17-2016 17:00:46 Sending x-ray to device...
X-Ray Creation:
Books Completed:
Slaughterhouse Five - Kurt Vonnegut: MOBI
X-Ray Sending:
Books Failed:
Slaughterhouse Five - Kurt Vonnegut:
MOBI: No device is connected.
Book._parse_shelfari_data:
{1: {'label': 'Narrator', 'description': 'A witty, satirical voice who is Vonnegut himself.'}, 2: {'label': 'Billy Pilgrim', 'description': 'The main character of the story, a World War II vet who was a POW in the bombing of Dresden. He shifts through time, and at one point was abducted by aliens.'}, 3: {'label': 'Roland Weary', 'description': "Short and stocky antitank gunner during the war and Billy's comrade. He was obsessed with torture devices and gore, in his dying words he blamed Billy for his death."}, 4: {'label': 'Paul Lazzaro', 'description': "A vengeful fellow prisoner-of-war who blamed Billy for Weary's death who said he can have anyone killed for a thousand dollars and would have Billy killed."}, 5: {'label': 'Kilgore Trout', 'description': 'Science fiction writer who Billy meets in an alley way.'}, 6: {'label': 'Edgar Derby', 'description': "A prisoner of war with Billy, an older fellow who was a teacher, and wanted to come to war because he couldn't stand seeing all his students go off to war while he wasted away."}, 7: {'label': 'Howard W. Campbell, Jr.', 'description': 'A Nazi who was an American and wrote several training documents to help the Germans better understand their enemy. and tried to end the war.'}, 8: {'label': 'Valencia Merble', 'description': "Billy's wife, a rich fat woman."}, 9: {'label': 'Robert Pilgrim', 'description': "Billy's son who fought in the Green Berets during the Vietnam war."}, 10: {'label': 'Barbara Pilgrim', 'description': "Billy's daughter who thinks her father is crazy and treats him like a child. She is attractive with the exception of her legs which are shaped like those of an Edwardian grand piano."}, 11: {'label': 'Montana Wildhack', 'description': 'Famous movie actress who Billy is put on display with at the zoo on Tralfamadore'}, 12: {'label': 'Eliot Rosewater', 'description': "Billy's friend and fellow patient who introduces him to Kilgore Trout."}, 13: {'label': 'Bertram Copeland Rumfoord', 'description': 'An aging Harvard history professor, retired Air Force brigadier general, and millionaire, who shares a hospital room with Billy and asks Billy to tell him stories about his experience in the Dresden Bombing.'}, 14: {'label': "Mary O'Hare", 'description': "Bernard O'Hare's wife and one of the people to whom the book is dedicated."}, 15: {'label': 'Billy Pilgrim', 'description': 'A man who is symbolic of the absurdities of war. His nonsensical demeanor facilitates his journey through time and space, ultimately articulating an anti-war message.'}, 16: {'label': "Bernard V. O'Hare", 'description': "The narrator's former war comrade."}, 17: {'label': 'Tralfamadorians', 'description': 'Aliens shaped like toilet plungers, each with one hand containing an eye in its palm.'}, 18: {'label': 'Eddie D. Slovik', 'description': 'The only American soldier to be shot for cowardice since the Civil War.'}}
{19: {'label': 'Dresden, Germany', 'description': 'Billy Pilgrim as part of approximately one hundred American prisoners-of-war are transported there to serve as contract laborers. The city was the last city in Germany to be carpet bombed.'}, 20: {'label': 'Tralfamadore', 'description': "Billy Pilgrim was abducted by aliens, taken to the planet Tralfamadore, and displayed naked in a zoo. The Tralfamodorians are shaped similar to toilet plungers. The shaft, which is extremely flexible, is topped by a small hand with one green eye in its palm. This planet's inhabitants see in four dimensions rather than three."}, 21: {'label': 'Ilium, New York', 'description': 'Much of this novel when not set in Dresden is set in Ilium where Billy grew up and established his optometrist office.'}, 22: {'label': 'Slaughterhouse Five', 'description': 'A former slaughterhouse used during World War II in Dresden to house American prisoners-of-war.'}, 23: {'label': 'Salmon Roe', 'description': 'Salmon eggs, better known as red caviar.'}, 24: {'label': 'Thumbscrew', 'description': 'An instrument of torture consisting of a ring into which the thumb is inserted and a screw that is then tightened gradually, until the bones are shattered.'}}
self._aliases before: {u'Tralfamadorians': [], u'Barbara Pilgrim': [], u'Montana Wildhack': [], u'Kilgore Trout': [], u'Edgar Derby': [], u'Howard W. Campbell, Jr.': [], u'Eliot Rosewater': [], u"Mary O'Hare": [], u'Bertram Copeland Rumfoord': [], u'Billy Pilgrim': [], u'Valencia Merble': [], u'Tralfamadore': [], u'Dresden, Germany': [], u'Slaughterhouse Five': [], u'Narrator': [], u'Paul Lazzaro': [], u'Eddie D. Slovik': [], u"Bernard V. O'Hare": [], u'Thumbscrew': [], u'Ilium, New York': [], u'Salmon Roe': [], u'Roland Weary': [], u'Robert Pilgrim': []}
self._aliases after: {u'Tralfamadorians': [], u'Barbara Pilgrim': [], u'Montana Wildhack': [], u'Kilgore Trout': [], u'Edgar Derby': [], u'Howard W. Campbell, Jr.': [], u'Eliot Rosewater': [], u"Mary O'Hare": [], u'Bertram Copeland Rumfoord': [], u'Billy Pilgrim': [], u'Valencia Merble': [], u'Tralfamadore': [], u'Dresden, Germany': [], u'Slaughterhouse Five': [], u'Narrator': [], u'Paul Lazzaro': [], u'Eddie D. Slovik': [], u"Bernard V. O'Hare": [], u'Thumbscrew': [], u'Ilium, New York': [], u'Salmon Roe': [], u'Roland Weary': [], u'Robert Pilgrim': []}
BookParser: self._aliases: {}
Job: 7 Creating X-Ray Files finished
Starting job: Creating X-Ray Files
05-17-2016 17:01:46 Slaughterhouse Five - Kurt Vonnegut
05-17-2016 17:01:46 Parsing shelfari data...
05-17-2016 17:01:48 Getting format specific data...
05-17-2016 17:01:48 Parsing book data...
05-17-2016 17:01:51 Creating x-ray...
05-17-2016 17:01:51 Sending x-ray to device...
X-Ray Creation:
Books Completed:
Slaughterhouse Five - Kurt Vonnegut: MOBI
X-Ray Sending:
Books Failed:
Slaughterhouse Five - Kurt Vonnegut:
MOBI: No device is connected.
Job: 7 Creating X-Ray Files finished
Starting job: Creating X-Ray Files
05-17-2016 17:01:46 Slaughterhouse Five - Kurt Vonnegut
05-17-2016 17:01:46 Parsing shelfari data...
05-17-2016 17:01:48 Getting format specific data...
05-17-2016 17:01:48 Parsing book data...
05-17-2016 17:01:51 Creating x-ray...
05-17-2016 17:01:51 Sending x-ray to device...
X-Ray Creation:
Books Completed:
Slaughterhouse Five - Kurt Vonnegut: MOBI
X-Ray Sending:
Books Failed:
Slaughterhouse Five - Kurt Vonnegut:
MOBI: No device is connected.
sqlite> select id, label, count from entity;
0
1 Narrator 0
3 Roland Wea 20
4 Paul Lazza 10
5 Kilgore Tr 24
6 Edgar Derb 24
7 Howard W. 0
8 Valencia M 4
9 Robert Pil 1
10 Barbara Pi 0
11 Montana Wi 9
12 Eliot Rose 4
13 Bertram Co 2
14 Mary O'Har 0
15 Billy Pilg 112
16 Bernard V. 0
17 Tralfamado 22
18 Eddie D. S 1
19 Dresden, G 1
20 Tralfamado 36
21 Ilium, New 3
22 Slaughterh 0
23 Salmon Roe 1
24 Thumbscrew 1
sqlite>
Here is the diff, to make the logs above make some sense:
diff --git a/lib/book.py b/lib/book.py
index 3348ec9..ecb9006 100644
--- a/lib/book.py
+++ b/lib/book.py
@@ -276,15 +276,20 @@ class Book(object):
self._parsed_shelfari_data = ShelfariParser(self._shelfari_url, spoilers=self._spoilers)
self._parsed_shelfari_data.parse()
- for char in self._parsed_shelfari_data.characters.items():
- if char[1]['label'] not in self._aliases.keys():
- self._aliases[char[1]['label']] = ''
+ print("Book._parse_shelfari_data:")
+ print(self._parsed_shelfari_data.characters)
+ print(self._parsed_shelfari_data.terms)
+ print("self._aliases before: %s" % self._aliases)
+ for char in self._parsed_shelfari_data.characters.values():
+ if char['label'] not in self._aliases.keys():
+ self._aliases[char['label']] = []
- for term in self._parsed_shelfari_data.terms.items():
- if term[1]['label'] not in self._aliases.keys():
- self._aliases[term[1]['label']] = ''
+ for term in self._parsed_shelfari_data.terms.values():
+ if term['label'] not in self._aliases.keys():
+ self._aliases[term['label']] = []
self._prefs['aliases'] = self._aliases
+ print("self._aliases after: %s" % self._aliases)
except:
self._status = self.FAIL
self._status_message = self.FAILED_COULD_NOT_PARSE_SHELFARI_DATA
diff --git a/lib/book_parser.py b/lib/book_parser.py
index fd7f934..b03b34e 100644
--- a/lib/book_parser.py
+++ b/lib/book_parser.py
@@ -42,6 +42,7 @@ class BookParser(object):
if term.lower() in self.entity_data.keys():
for alias in alias_list:
self._aliases[alias.lower()] = term.lower()
+ print("BookParser: self._aliases: %s" % self._aliases)
words_list = self._aliases.keys() + self.entity_data.keys()
@stoduk Yes, I was thinking about this this morning. Haven't had time to look at it but I'm pretty sure the code is about the same so it should be a very similar fix. I haven't run across this without going into the book config first so I'm confused as to why because it's basically the same thing.
The fix for #5 works if we are doing what I thought was a normal path (get ASIN, lookup in shelfari, edit aliases and then parse the book), but it seems you can go straight to parsing the book (by skipping the "book specific preferences" option and going straight for "create/update xrays").
A very similar bug is present there, in a different code path. Similar setup - ShelfariParser isn't setting up any aliases, so we set self._aliases[character_label] = "" with the result that once again in BookParser we'll end up with one character (the last one parsed in BookParser.init) being the one with "" as its alias - and so having a very large number of matches against it (any time we have a single space character between two word boundaries).
See logs below - first two are from running the "create/update xrays" on the two books without first going via "book specific preferences", both of these are broken. The third log is from when I go to "book specific preferences", click the third button, hit ok, then do "create/update xrays" - in this path the aliases are being populated from the fix I made for issue #5, while in the previous two cases this path isn't hit so things are broken still.