Fix asyncio error in tests

cuducos commented 6 years ago

What is the purpose of this Pull Request? Last build with the updated version of aiohttp has broken our tests CI.

What was done to achieve this purpose? This PR adapts our code to the new error architecture of current version of aiohttp, using TimeoutError from concurrent.futures instead of aiohttp.client_exceptions.

How to test if it really works? $ pytest or try to download something and hope the server takes too long to cause a timeout ; )

Who can help reviewing it? @anaschwendler

anaschwendler commented 6 years ago

Hi @cuducos it seems that the tests keep failing, I'll get the details below.

What I did to test this PR:

Cloned the project:

$ git clone git@github.com:datasciencebr/serenata-toolbox.git

Change to its folder:
```
$ cd serenata-toolbox
```

Change to @cuducos’ branch:

$ git fetch origin
$ git checkout -b cuducos-fix-aiohttp-error origin/cuducos-fix-aiohttp-error
$ git merge master

Run the tests:

$ pip install pytest pytest-cov
$ pytest

What I'm thinking: the problem is no more the datasets on Câmara, but another think that I couldn't identify. This PR fix an error with the asyncio library, and liberate the unit tests. What do you think we add this refactor to the code and then fix the problem with the journey tests?

``` ➜ serenata-toolbox git:(cuducos-fix-aiohttp-error) ✗ pytest =========================================================== test session starts =========================================================== platform darwin -- Python 3.6.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0 -- /Users/anaschwendler/anaconda3/bin/python cachedir: .cache rootdir: /Users/anaschwendler/Documents/projects/serenata-toolbox, inifile: pytest.ini plugins: cov-2.5.1 collected 70 items tests/journey/test_chamber_of_deputies_dataset.py::TestChamberOfDeputiesDataset::test_fetch_translate_clean_integration PASSED tests/journey/test_chamber_of_deputies_deputies_dataset.py::TestDeputiesDataset::test_fetch FAILED tests/journey/test_chamber_of_deputies_official_missions_dataset.py::TestOfficialMissionsDataset::test_fetch PASSED tests/journey/test_chamber_of_deputies_presences_dataset.py::TestPresencesDataset::test_fetch PASSED tests/journey/test_chamber_of_deputies_session_start_times_dataset.py::TestSpeechesDataset::test_fetch PASSED tests/journey/test_chamber_of_deputies_speeches_dataset.py::TestSpeechesDataset::test_fetch PASSED tests/journey/test_federal_senate_dataset.py::TestJourneyFederalSenateDataset::test_journey_federal_senate_dataset PASSED tests/unit/test_datasets.py::TestDatasets::test_init_with_local_directory PASSED tests/unit/test_datasets.py::TestDatasets::test_init_without_local_directory PASSED tests/unit/test_datasets.py::TestDatasets::test_pending PASSED tests/unit/test_datasets.py::TestDatasets::test_upload_all PASSED tests/unit/test_datasets.py::TestFetch::test_fetch PASSED tests/unit/test_datasets.py::TestFetch::test_fetch_latest_backup PASSED tests/unit/test_datasets.py::TestFetch::test_fetch_latest_backup_only_when_missing PASSED tests/unit/test_datasets.py::TestFetch::test_fetch_latest_backup_with_force_all PASSED tests/unit/test_datasets_contextmanager.py::TestContextManager::test_status_message PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_download_multiple_files PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_download_no_file PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_download_single_file PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_download_timeout PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_init PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_init_file_target PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_init_no_bucket PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_init_no_existing_target PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_init_no_region PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_init_no_region_no_bucket PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_init_no_timeout PASSED tests/unit/test_datasets_downloader.py::TestDownloader::test_url PASSED tests/unit/test_datasets_helpers.py::TestDatasetsHelpersXml::test_extract_date_default_to_br_format PASSED tests/unit/test_datasets_helpers.py::TestDatasetsHelpersXml::test_extract_date_supports_custom_format PASSED tests/unit/test_datasets_helpers.py::TestDatasetsHelpersXml::test_extract_datetime_default_to_br_format PASSED tests/unit/test_datasets_helpers.py::TestDatasetsHelpersXml::test_extract_datetime_supports_custom_format PASSED tests/unit/test_datasets_helpers.py::TestDatasetsHelpersXml::test_extract_text PASSED tests/unit/test_datasets_helpers.py::TestDatasetsHelpersDataframes::test_save_to_csv PASSED tests/unit/test_datasets_helpers.py::TestDatasetsHelpersDataframes::test_translate_column PASSED tests/unit/test_datasets_helpers.py::TestDatasetsHelpersConfigLookup::test_find_config PASSED tests/unit/test_datasets_helpers.py::TestDatasetsHelpersConfigLookup::test_find_config_a_file PASSED tests/unit/test_datasets_helpers.py::TestDatasetsHelpersConfigLookup::test_find_config_not_found PASSED tests/unit/test_datasets_local.py::TestLocal::test_all PASSED tests/unit/test_datasets_local.py::TestLocal::test_delete PASSED tests/unit/test_datasets_local.py::TestLocal::test_delete_dir PASSED tests/unit/test_datasets_local.py::TestLocal::test_delete_non_existent_file PASSED tests/unit/test_datasets_local.py::TestLocal::test_init PASSED tests/unit/test_datasets_local.py::TestLocal::test_init_non_existent_dir PASSED tests/unit/test_datasets_local.py::TestLocal::test_init_with_file_path PASSED tests/unit/test_datasets_remote.py::TestRemote::test_all PASSED tests/unit/test_datasets_remote.py::TestRemote::test_bucket PASSED tests/unit/test_datasets_remote.py::TestRemote::test_bucket_no_config PASSED tests/unit/test_datasets_remote.py::TestRemote::test_bucket_no_section PASSED tests/unit/test_datasets_remote.py::TestRemote::test_config_exists_when_it_doesnt_exist PASSED tests/unit/test_datasets_remote.py::TestRemote::test_config_exists_when_it_is_a_directory PASSED tests/unit/test_datasets_remote.py::TestRemote::test_config_exists_when_it_is_a_file PASSED tests/unit/test_datasets_remote.py::TestRemote::test_delete PASSED tests/unit/test_datasets_remote.py::TestRemote::test_init_with_old_config PASSED tests/unit/test_datasets_remote.py::TestRemote::test_init_without_config PASSED tests/unit/test_datasets_remote.py::TestRemote::test_s3 PASSED tests/unit/test_datasets_remote.py::TestRemote::test_successful_init PASSED tests/unit/test_datasets_remote.py::TestRemote::test_upload PASSED tests/unit/test_federal_senate_dataset.py::TestFederalSenateDataset::test_dataset_cleanup PASSED tests/unit/test_federal_senate_dataset.py::TestFederalSenateDataset::test_dataset_translation PASSED tests/unit/test_federal_senate_dataset.py::TestFederalSenateDataset::test_dataset_translation_failing_to_find_file PASSED tests/unit/test_federal_senate_dataset.py::TestFederalSenateDataset::test_fetch_files_from_S3 PASSED tests/unit/test_federal_senate_dataset.py::TestFederalSenateDataset::test_fetch_raises_HTTPError PASSED tests/unit/test_federal_senate_dataset.py::TestFederalSenateDataset::test_fetch_raises_URLError PASSED tests/unit/test_federal_senate_dataset.py::TestFederalSenateDataset::test_if_translation_happened_as_expected PASSED tests/unit/test_reimbursements.py::TestReimbursements::test_aggregates_partial_reimbursements_in_single_record PASSED tests/unit/chambers_of_deputies/test_chamber_of_deputies_dataset.py::TestChamberOfDeputiesDataset::test_clean_2017_reimbursements PASSED tests/unit/chambers_of_deputies/test_chamber_of_deputies_dataset.py::TestChamberOfDeputiesDataset::test_fetch_chambers_of_deputies_datasets PASSED tests/unit/chambers_of_deputies/test_chamber_of_deputies_dataset.py::TestChamberOfDeputiesDataset::test_translate_2017_dataset PASSED tests/unit/chambers_of_deputies/test_chamber_of_deputies_dataset.py::TestChamberOfDeputiesDataset::test_translate_csv_with_reimbursement_with_net_value_with_decimal_comma PASSED ======================================================== slowest 10 test durations ======================================================== 815.58s call tests/journey/test_chamber_of_deputies_dataset.py::TestChamberOfDeputiesDataset::test_fetch_translate_clean_integration 230.68s call tests/journey/test_federal_senate_dataset.py::TestJourneyFederalSenateDataset::test_journey_federal_senate_dataset 6.01s call tests/journey/test_chamber_of_deputies_session_start_times_dataset.py::TestSpeechesDataset::test_fetch 4.44s call tests/journey/test_chamber_of_deputies_deputies_dataset.py::TestDeputiesDataset::test_fetch 3.80s call tests/journey/test_chamber_of_deputies_official_missions_dataset.py::TestOfficialMissionsDataset::test_fetch 2.28s call tests/journey/test_chamber_of_deputies_presences_dataset.py::TestPresencesDataset::test_fetch 2.27s call tests/journey/test_chamber_of_deputies_speeches_dataset.py::TestSpeechesDataset::test_fetch 1.01s call tests/unit/test_datasets_downloader.py::TestDownloader::test_download_timeout 0.11s call tests/unit/test_reimbursements.py::TestReimbursements::test_aggregates_partial_reimbursements_in_single_record 0.05s call tests/unit/chambers_of_deputies/test_chamber_of_deputies_dataset.py::TestChamberOfDeputiesDataset::test_translate_2017_dataset ================================================================ FAILURES ================================================================= _____________________________________________________ TestDeputiesDataset.test_fetch ______________________________________________________ self = def test_fetch(self): df = self.subject.fetch() actualColumns = df.columns expectedColumns = [ 'congressperson_id', 'budget_id', 'condition', 'congressperson_document', 'civil_name', 'congressperson_name', 'picture_url', 'gender', 'state', 'party', 'phone_number', 'email' ] self.assertTrue((np.array(expectedColumns) == np.array(actualColumns)).all()) expectedGenders = ['male', 'female'] actualGenders = df.gender.unique() self.assertTrue((np.array(expectedGenders) == np.array(actualGenders)).all()) expectedConditions = ['Holder', 'Substitute'] actualConditions = df.condition.unique() > self.assertTrue((np.array(expectedConditions) == np.array(actualConditions)).all()) E AssertionError: False is not true tests/journey/test_chamber_of_deputies_deputies_dataset.py:29: AssertionError ============================================================ warnings summary ============================================================= tests/journey/test_chamber_of_deputies_dataset.py::TestChamberOfDeputiesDataset::test_fetch_translate_clean_integration /Users/anaschwendler/anaconda3/lib/python3.6/unittest/case.py:605: DtypeWarning: Columns (26) have mixed types. Specify dtype option on import or set low_memory=False. testMethod() /Users/anaschwendler/anaconda3/lib/python3.6/unittest/case.py:605: DtypeWarning: Columns (10,26) have mixed types. Specify dtype option on import or set low_memory=False. testMethod() /Users/anaschwendler/anaconda3/lib/python3.6/site-packages/_pytest/unittest.py:176: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False. self._testcase(result=self) -- Docs: http://doc.pytest.org/en/latest/warnings.html =========================================== 1 failed, 69 passed, 3 warnings in 1067.69 seconds ============================================ ```

anaschwendler commented 6 years ago

Running the test that is getting the error, I can see that the only thing that is different is that the order from the expectedConditions = ['Holder', 'Substitute'] and the actualConditions = df.condition.unique() that shows [Substitute, Holder] when I get the log:

``` ➜ serenata-toolbox git:(cuducos-fix-aiohttp-error) ✗ pytest tests/journey/test_chamber_of_deputies_deputies_dataset.py =========================================================== test session starts =========================================================== platform darwin -- Python 3.6.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0 -- /Users/anaschwendler/anaconda3/bin/python cachedir: .cache rootdir: /Users/anaschwendler/Documents/projects/serenata-toolbox, inifile: pytest.ini plugins: cov-2.5.1 collected 1 item tests/journey/test_chamber_of_deputies_deputies_dataset.py::TestDeputiesDataset::test_fetch FAILED ======================================================== slowest 10 test durations ======================================================== 2.08s call tests/journey/test_chamber_of_deputies_deputies_dataset.py::TestDeputiesDataset::test_fetch 0.00s setup tests/journey/test_chamber_of_deputies_deputies_dataset.py::TestDeputiesDataset::test_fetch 0.00s teardown tests/journey/test_chamber_of_deputies_deputies_dataset.py::TestDeputiesDataset::test_fetch ================================================================ FAILURES ================================================================= _____________________________________________________ TestDeputiesDataset.test_fetch ______________________________________________________ self = def test_fetch(self): df = self.subject.fetch() actualColumns = df.columns expectedColumns = [ 'congressperson_id', 'budget_id', 'condition', 'congressperson_document', 'civil_name', 'congressperson_name', 'picture_url', 'gender', 'state', 'party', 'phone_number', 'email' ] self.assertTrue((np.array(expectedColumns) == np.array(actualColumns)).all()) expectedGenders = ['male', 'female'] actualGenders = df.gender.unique() self.assertTrue((np.array(expectedGenders) == np.array(actualGenders)).all()) expectedConditions = ['Holder', 'Substitute'] actualConditions = df.condition.unique() print(actualConditions) > self.assertTrue((np.array(expectedConditions) == np.array(actualConditions)).all()) E AssertionError: False is not true tests/journey/test_chamber_of_deputies_deputies_dataset.py:30: AssertionError ---------------------------------------------------------- Captured stdout call ----------------------------------------------------------- [Substitute, Holder] Categories (2, object): [Substitute, Holder] ======================================================== 1 failed in 2.95 seconds ========================================================= ```

So I suggest to merge this PR and then solve the problem with the order of the expectedConditions = ['Holder', 'Substitute'], to check if we need to explore more ;)

cuducos commented 6 years ago

So I suggest to merge this PR and then solve the problem with the order of the expectedConditions = ['Holder', 'Substitute'], to check if we need to explore more ;)

IMHO it makes sense: this erro has nothing to do with this PR.

the only thing that is different is that the order from the expectedConditions = ['Holder', 'Substitute'] and the actualConditions = df.condition.unique() that shows [Substitute, Holder]

BTW using a set instead of a list fixes that error ; )

anaschwendler commented 6 years ago

BTW using a set instead of a list fixes that error ; )

Based on what I learnt yesterday I'll open an PR fixing that soon.

okfn-brasil / serenata-toolbox

Fix asyncio error in tests #162