maastrichtlawtech / extraction_libraries

Python libraries for extracting from data sources like Rechtspraak, ECHR, Cellar
Apache License 2.0
10 stars 1 forks source link

link parameter can not be used in echr-extractor #4

Closed NagisYuksel closed 10 months ago

NagisYuksel commented 10 months ago

I enter the URL of the advanced search I made on the HUDOC website into the "link" parameter, which is a parameter of the get_echr and get_echr_extra functions in ECHR-extractor, but I get attached error.

image001

shashankmc commented 10 months ago

Thanks for trying the extractor out. I've tried to replicate your error and invalid decimal literal goes away when %20 and %22 are replaced with and " respectively. It does produce a new error.

@Cloud956 Could you please look into this? The issue persists even if the steps are followed as provided in the appendix.

Cloud956 commented 10 months ago

@NagisYuksel I have published an update which should resolve your issues, please update your package and test it out.

The main problem seems to be caused by the use of "" brackets in the search fields, like in your (NOT "has been a violation of Article 6")

This has now been fixed, but only for the 'Text' field. The " brackets will still cause issues for other parameters, but from what I saw on the HUDOC website you only might really need it for the 'Text' field of the search. If you need to use the brackets in other search fields, please reach out again and raise another issue.

I will close the issue, as it is resolved with the new update.

NagisYuksel commented 10 months ago

Thank you very much for your interest, but when I write the code below, I get an error again, I couldn't understand what I did wrong.

import echr_extractor as echr echr_link="https://hudoc.echr.coe.int/eng#{%22fulltext%22:[%22(NOT%20\%22has%20been%20a%20violation%20of%20Article%206\%22)%20AND%20(\%22has%20been%20no%20violation%20of%20Article%206\%22)%22]} https://hudoc.echr.coe.int/eng#%7B%22fulltext%22:[%22(NOT%20%5C%22has%20been%20a%20violation%20of%20Article%206%5C%22)%20AND%20(%5C%22has%20been%20no%20violation%20of%20Article%206%5C%22)%22]%7D" df, json = echr.get_echr_extra(link=echr_link)


C:\Users\nagihan.yuksel\PycharmProjects\pythonProject2\venv\Scripts\python.exe C:\Users\nagihan.yuksel\PycharmProjects\pythonProject2\main.py Traceback (most recent call last): File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject2\main.py", line 6, in df, json = echr.get_echr_extra(link=echr_link) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject2\venv\Lib\site-packages\echr_extractor\echr.py", line 67, in get_echr_extra df = get_echr(start_id=start_id, end_id=end_id, start_date=start_date, end_date=end_date, verbose=verbose, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject2\venv\Lib\site-packages\echr_extractor\echr.py", line 27, in get_echr df = get_echr_metadata(start_id=start_id, end_id=end_id, start_date=start_date, end_date=end_date, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject2\venv\Lib\site-packages\echr_extractor\ECHR_metadata_harvester.py", line 171, in get_echr_metadata META_URL = link_to_query(link) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject2\venv\Lib\site-packages\echr_extractor\ECHR_metadata_harvester.py", line 119, in link_to_query link_dictionary = eval(link[start:]) ^^^^^^^^^^^^^^^^^^ File "", line 1 {%2 ^ SyntaxError: invalid syntax

Process finished with exit code 1

Piotr Lewandowski @.***>, 25 Eyl 2023 Pzt, 17:07 tarihinde şunu yazdı:

@NagisYuksel https://github.com/NagisYuksel I have published an update which should resolve your issues, please update your package and test it out.

The main problem seems to be caused by the use of "" brackets in the search fields, like in your (NOT "has been a violation of Article 6")

This has now been fixed, but only for the 'Text' field. The " brackets will still cause issues for other parameters, but from what I saw on the HUDOC website you only might really need it for the 'Text' field of the search. If you need to use the brackets in other search fields, please reach out again and raise another issue.

I will close the issue, as it is resolved with the new update.

— Reply to this email directly, view it on GitHub https://github.com/maastrichtlawtech/extraction_libraries/issues/4#issuecomment-1733784754, or unsubscribe https://github.com/notifications/unsubscribe-auth/BB5WLXN2O7KOYM5FPFZI6ETX4GF23ANCNFSM6AAAAAA4WIKYXY . You are receiving this because you were mentioned.Message ID: @.***>

shashankmc commented 10 months ago

Aren't there like 2 links for the variable echr_link?

NagisYuksel commented 10 months ago

I'm trying to get the following search result:

https://hudoc.echr.coe.int/eng#{%22fulltext%22:[%22(NOT%20\%22has%20been%20a%20violation%20of%20Article%206\%22)%20AND%20(\%22has%20been%20no%20violation%20of%20Article%206\%22)%22]}

Shashank @.***>, 3 Eki 2023 Sal, 10:30 tarihinde şunu yazdı:

Aren't there like 2 links for the variable echr_link?

— Reply to this email directly, view it on GitHub https://github.com/maastrichtlawtech/extraction_libraries/issues/4#issuecomment-1744363099, or unsubscribe https://github.com/notifications/unsubscribe-auth/BB5WLXK7SIJFH4M2TVSVBWDX5O5K3AVCNFSM6AAAAAA4WIKYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBUGM3DGMBZHE . You are receiving this because you were mentioned.Message ID: @.***>

Cloud956 commented 10 months ago

@NagisYuksel I just pushed a new version of the package, can you check if it works now?

NagisYuksel commented 10 months ago

Yes it worked, but while HUDOC showed 693 results, the code showed 117993 results. Could it be faulty?

Piotr Lewandowski @.***>, 3 Eki 2023 Sal, 11:43 tarihinde şunu yazdı:

@NagisYuksel https://github.com/NagisYuksel I just pushed a new version of the package, can you check if it works now?

— Reply to this email directly, view it on GitHub https://github.com/maastrichtlawtech/extraction_libraries/issues/4#issuecomment-1744500412, or unsubscribe https://github.com/notifications/unsubscribe-auth/BB5WLXPEGI42I4A4GBLHZWDX5PF3LAVCNFSM6AAAAAA4WIKYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBUGUYDANBRGI . You are receiving this because you were mentioned.Message ID: @.***>

Cloud956 commented 10 months ago

You are correct, I pushed another update. Please check it out and test if all works now.

NagisYuksel commented 10 months ago

It gave 117993 results again. (with version 1.0.38)

Piotr Lewandowski @.***>, 3 Eki 2023 Sal, 13:13 tarihinde şunu yazdı:

You are correct, I pushed another update. Please check it out and test if all works now.

— Reply to this email directly, view it on GitHub https://github.com/maastrichtlawtech/extraction_libraries/issues/4#issuecomment-1744656438, or unsubscribe https://github.com/notifications/unsubscribe-auth/BB5WLXKBQDPD3C22WH3UYZDX5PQN3AVCNFSM6AAAAAA4WIKYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBUGY2TMNBTHA . You are receiving this because you were mentioned.Message ID: @.***>

Cloud956 commented 10 months ago

Pushed another update, please check this one out.

NagisYuksel commented 10 months ago

Yes, it worked like that, I have one more question, it did not work when I selected only Grand Chamber and Chamber in the menu on the left. If it is possible to fix it, can you please contact me when you have time? I took up a lot of your time, I'm sorry.

Piotr Lewandowski @.***>, 3 Eki 2023 Sal, 14:54 tarihinde şunu yazdı:

Pushed another update, please check this one out.

— Reply to this email directly, view it on GitHub https://github.com/maastrichtlawtech/extraction_libraries/issues/4#issuecomment-1744812400, or unsubscribe https://github.com/notifications/unsubscribe-auth/BB5WLXIJIDJM3GFZ2UPAPJDX5P4GDAVCNFSM6AAAAAA4WIKYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBUHAYTENBQGA . You are receiving this because you were mentioned.Message ID: @.***>

Cloud956 commented 10 months ago

Pushed another one out, please test it out.

NagisYuksel commented 10 months ago

I got the following error: INFO:root:--- STARTING ECHR DOWNLOAD FOR --- Traceback (most recent call last): File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject6\main.py", line 3, in df, json = echr.get_echr_extra(link=echr_link, save_file='y') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject6\venv\Lib\site-packages\echr_extractor\echr.py", line 67, in get_echr_extra df = get_echr(start_id=start_id, end_id=end_id, start_date=start_date, end_date=end_date, verbose=verbose,

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject6\venv\Lib\site-packages\echr_extractor\echr.py", line 27, in get_echr df = get_echr_metadata(start_id=start_id, end_id=end_id, start_date=start_date, end_date=end_date,

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject6\venv\Lib\site-packages\echr_extractor\ECHR_metadata_harvester.py", line 185, in get_echr_metadata META_URL = link_to_query(link) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject6\venv\Lib\site-packages\echr_extractor\ECHR_metadata_harvester.py", line 149, in link_to_query query_elements.append(funct(key, vals)) ^^^^^^^^^^^^^^^^ TypeError: 'NoneType' object is not callable

Piotr Lewandowski @.***>, 3 Eki 2023 Sal, 15:46 tarihinde şunu yazdı:

Pushed another one out, please test it out.

— Reply to this email directly, view it on GitHub https://github.com/maastrichtlawtech/extraction_libraries/issues/4#issuecomment-1744896348, or unsubscribe https://github.com/notifications/unsubscribe-auth/BB5WLXNFN5MLLH7LIH7J6BLX5QCKRAVCNFSM6AAAAAA4WIKYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBUHA4TMMZUHA . You are receiving this because you were mentioned.Message ID: @.***>

NagisYuksel commented 10 months ago

The link parameter is as follows:

echr_link="https://hudoc.echr.coe.int/eng#{%22fulltext%22:[%22(\%22has%20been%20a%20violation%20of%20Article%206\%22)%20AND%20NOT(\%22has%20been%20no%20violation%20of%20Article%206\%22)%22],%22sort%22:[%22docnamesort%20Ascending%22],%22documentcollectionid2%22:[%22GRANDCHAMBER%22,%22CHAMBER%22]}"

Nagihan Ünal @.***>, 3 Eki 2023 Sal, 16:25 tarihinde şunu yazdı:

I got the following error: INFO:root:--- STARTING ECHR DOWNLOAD FOR --- Traceback (most recent call last): File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject6\main.py", line 3, in df, json = echr.get_echr_extra(link=echr_link, save_file='y') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject6\venv\Lib\site-packages\echr_extractor\echr.py", line 67, in get_echr_extra df = get_echr(start_id=start_id, end_id=end_id, start_date=start_date, end_date=end_date, verbose=verbose,

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject6\venv\Lib\site-packages\echr_extractor\echr.py", line 27, in get_echr df = get_echr_metadata(start_id=start_id, end_id=end_id, start_date=start_date, end_date=end_date,

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject6\venv\Lib\site-packages\echr_extractor\ECHR_metadata_harvester.py", line 185, in get_echr_metadata META_URL = link_to_query(link) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\nagihan.yuksel\PycharmProjects\pythonProject6\venv\Lib\site-packages\echr_extractor\ECHR_metadata_harvester.py", line 149, in link_to_query query_elements.append(funct(key, vals)) ^^^^^^^^^^^^^^^^ TypeError: 'NoneType' object is not callable

Piotr Lewandowski @.***>, 3 Eki 2023 Sal, 15:46 tarihinde şunu yazdı:

Pushed another one out, please test it out.

— Reply to this email directly, view it on GitHub https://github.com/maastrichtlawtech/extraction_libraries/issues/4#issuecomment-1744896348, or unsubscribe https://github.com/notifications/unsubscribe-auth/BB5WLXNFN5MLLH7LIH7J6BLX5QCKRAVCNFSM6AAAAAA4WIKYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBUHA4TMMZUHA . You are receiving this because you were mentioned.Message ID: @.***>

Cloud956 commented 10 months ago

Its because of the sorting. Is it crucial for you to have the metadata download also sorted? I pushed an update which ignores sorting for now.

NagisYuksel commented 10 months ago

Sorry, I clicked there by mistake.Of course, the sorting is not important. Thank you very much for your effort.

Piotr Lewandowski @.***>, 3 Eki 2023 Sal, 16:33 tarihinde şunu yazdı:

Its because of the sorting. Is it crucial for you to have the metadata download also sorted? I pushed an update which ignores sorting for now.

— Reply to this email directly, view it on GitHub https://github.com/maastrichtlawtech/extraction_libraries/issues/4#issuecomment-1744989167, or unsubscribe https://github.com/notifications/unsubscribe-auth/BB5WLXMFDP7IICFDZZYCCHTX5QHZNAVCNFSM6AAAAAA4WIKYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBUHE4DSMJWG4 . You are receiving this because you were mentioned.Message ID: @.***>

Cloud956 commented 10 months ago

If it is important, I think it would be easier to do it after the extraction is done and you have a dataframe, so I will skip sorting in this package.

NagisYuksel commented 10 months ago

Yes I think so too, thank you very much.

Piotr Lewandowski @.***>, 3 Eki 2023 Sal, 16:45 tarihinde şunu yazdı:

If it is important, I think it would be easier to do it after the extraction is done and you have a dataframe, so I will skip sorting in this package.

— Reply to this email directly, view it on GitHub https://github.com/maastrichtlawtech/extraction_libraries/issues/4#issuecomment-1745011990, or unsubscribe https://github.com/notifications/unsubscribe-auth/BB5WLXMF2ZBNHYNZO5LFP6LX5QJHJAVCNFSM6AAAAAA4WIKYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBVGAYTCOJZGA . You are receiving this because you were mentioned.Message ID: @.***>

Cloud956 commented 10 months ago

@NagisYuksel I pushed an update to the package with a new way to download echr data. Instead of the link to the website, you can use the API call which the website is using to get the search data. This method should 100% not break and does not include any sorting. Full documentation at https://pypi.org/project/echr-extractor/1.0.43/ . I think this should resolve your problems, I'm closing this issue.