inspirehep / hepcrawl

Scrapy project for feeds into INSPIRE-HEP
http://inspirehep.net
Other
17 stars 30 forks source link

tests: python2/3 compatibility #266

Closed tsgit closed 5 years ago

tsgit commented 5 years ago
* ensure tests pass under python2 and python3

* use deepdiff for comparing nested data-structures

* remove deep_sort, because sorting a list of dicts
  is not supported in python3

* accommodate differences in str types between python 2/3

Signed-off-by: Thorsten Schwander thorsten.schwander@gmail.com

Description

improve compatibility with python3 by using six update tests so that all existing unit tests pass with both python 2.7 and python 3.7

Related Issue

Motivation and Context

Checklist:

tsgit commented 5 years ago

I agree that python: - '3.7' should be added to .travis.yml . I'm not sure it'll work without some other modifications, though

since you commented on import order, I improved that according to isort in several places

I use DeepDiff everywhere a direct comparison expected = result fails otherwise with current tests. those tests should probably be improved instead.

tsgit commented 5 years ago

mmh, so for example in the desy test the MARC tags in the XML file aren't sorted numerically which should be fine, since that's not a requirement

however the sort order of the expected json reflects that -- while I think the author from MARC 100 should be before MARC 70x in the converted json

https://github.com/inspirehep/hepcrawl/blob/master/tests/unit/responses/desy/desy_record.xml 701 appears before 100

https://github.com/inspirehep/hepcrawl/blob/master/tests/unit/responses/desy/desy_record_expected.json preserves that order

the actual test output is correct in this case and changes the order to what it should be

 'authors': [{'affiliations': [{'value': 'DESY'}],
   'full_name': 'Turkot, Oleksii',
   'ids': [{'schema': 'ORCID', 'value': '0000-0001-5352-7744'}],
   'raw_affiliations': [{'value': 'Deutsches Elektronen-Synchrotron'}]},
  {'full_name': 'Foster, Brian',
   'ids': [{'schema': 'ORCID', 'value': '0000-0001-5699-3046'}],
   'inspire_roles': ['supervisor'],
   'raw_affiliations': [{'value': 'Deutsches Elektronen-Synchrotron'}]},
  {'full_name': 'Wichmann, Katarzyna',
   'inspire_roles': ['supervisor'],
   'raw_affiliations': [{'value': 'Deutsches Elektronen-Synchrotron'}]}],

so I guess deep_sort was a flawed and unnecessary workaround ?

michamos commented 5 years ago

@tsgit on python 3 the documentation fails to build. This seems to be due to it using Sphinx 2.0 (which is Python 3 only) and the hepcrawl config being incompatible with it. You can either try fixing the config to make it compatible, or if you don't wan to bother simply pin sphinx to <2.0.

tsgit commented 5 years ago

well there is a snowball effect of things just in building the docs

I fixed the immediate issue with sphinx however there is also an issue with the use of autosemver and python 3 and then there is an issue with scrapyd < 2.0 and python3 and maybe other things

michamos commented 5 years ago

@tsgit It would probably be easiest to disable docs building in https://github.com/inspirehep/hepcrawl/blob/02a2d9919c8b1ce904c1b7a85d1072e2ba87834e/docker-compose.test.py3.yml#L69. What do you think?