pdfliberation / whatwordwhere

Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.
22 stars 5 forks source link

Error loading test data #9

Closed alexbyrnes closed 10 years ago

alexbyrnes commented 10 years ago
python manage.py test_load 

gives me

File "/home/vagrant/whatwordwhere/hocr_util/load_utils/load_page.py", line 77, in enter_words
    cursor.copy_expert(sql, transactions_to_commit, size=length)
psycopg2.DataError: Geometry SRID (0) does not match column SRID (4326)
CONTEXT:  COPY documents_pageword, line 1, column poly: "010300000001000000050000000000000000003C4000000000000000000000000000003C4000000000000034400000000000..."

There's a constraint on spatial_ref_sys that says srid has to be greater than zero. That's as much as I know at this point.

jsfenfen commented 10 years ago

See text in documents/models.py... Ya gotta drop default constraints w something like this...

alter table documents_pageword drop constraint "enforce_srid_poly"; alter table documents_pageword drop constraint "enforce_dims_poly"; alter table documents_pageword drop constraint "enforce_geotype_poly";

jsfenfen commented 10 years ago

This is gnarlier to solve at a programmatic level; the standard custom sql solution fails (i.e. putting it in documents/sql/PageWord.sql) and there seems to be a specific geodjango post_sync bug as well -- see here: https://code.djangoproject.com/ticket/7561 so cleanly killing the constraints from a post_sync signal specifically appears also not to work--it'll actually work if syncdb is run twice because the constraints will be there only by the second time. Which isn't ideal either.

jsfenfen commented 10 years ago

Added this as a management command and noted it in the docs. See https://github.com/jsfenfen/whatwordwhere/commit/62b804dbac8bfbd34e242187637f2f2bcc6e6241.

alexbyrnes commented 10 years ago

I tried the sql last night and got

constraint "enforce_srid_poly" of relation "documents_pageword" does not exist

Seems to be the same for the command. I don't think that's the constraint that's being enforced. It looks like that's geodjango's default srid is 4326. Is this an artificial constraint by geodjango? I don't see any constraints left to drop.

jsfenfen commented 10 years ago

Can you try this against the main repo? Not quite sure what's up with this branch.

You might look at something like this:

select * from geometry_columns;

I can't reproduce this problem--what postgres and postgis version do you have?

On Sun, Feb 23, 2014 at 10:39 AM, Alex Byrnes notifications@github.comwrote:

I tried the sql last night and got

constraint "enforce_srid_poly" of relation "documents_pageword" does not exist

Seems to be the same for the command. I don't think that's the constraint that's being enforced. It looks like that's geodjango's default srid is

  1. Is this an artificial constraint by geodjango? I don't see any constraints left to drop.

Reply to this email directly or view it on GitHubhttps://github.com/pdfliberation/whatwordwhere/issues/9#issuecomment-35834198 .

alexbyrnes commented 10 years ago

I tried a quick reboot with the other repo at the end of the day yesterday, no dice. I'm using postgresql-9.3-postgis debian package.

If you can't reproduce it no problem. I'll check it after the next update.