tilburgsciencehub / website

Learn to work more efficiently on empirical research projects.
https://tilburgsciencehub.com
38 stars 46 forks source link

[Q]: Is there a program which can assist in searching for specific keywords in open databases of certain Courts? #303

Closed Jacintha777 closed 2 years ago

Jacintha777 commented 2 years ago

Contact Details

j.r.k.asarfi@tilburguniversity.edu

Shoot!

I am looking into the case law of national courts and in this regard I have to search for specific keywords in two databases. These databases are: http://www.ttlawcourts.org/index.php/law-library/search-librarys-holdings and http://rechtspraak.sr/ In the first database the specific keywords to search for are: referral, referral jurisdiction, Article 214 RTC, Caribbean Court of Justice, CCJ. The specific keywords for the second database are in Dutch namely: ___verwijzingsprocedure, Herziene Verdrag van Chaguaramas, Caribisch Hof van Justitie, verwijzing naar het Caribisch Hof van Justitie__**".

The expectation is that searching these databases for these specific keywords will result in cases which are relevant for my research.

Looking forward to your reply.

Code of Conduct

hannesdatta commented 2 years ago

Hi @Jacintha777, thanks a bunch. Could you please provide a bit more detail on how the search is going to be executed?

1) With regard to ttlawcourts:

At http://www.ttlawcourts.org/index.php/law-library/search-librarys-holdings, I do not see a "keyword" field. image

Further, does any filter need to be used on the Documents? image

Further, please specify how the search results should be saved:

image

image

2) With regard to rechtspraak.sr

In formulating this issue, please imagine you are instructing a Research Assistant to strictly follow a particular procedure. Without any "thinking". Just executing a procedure. That way, we can instruct a program to do the same thing. Thanks!

Jacintha777 commented 2 years ago

Dear @hannesdatta, with regard to your first point: click on Supreme Court (second on the left under the logo of Judiciary of Trinidad & Tobago) and then on High Court, then you see the search the site option, there you can search for the keywords. I just did with the word 'referral and then you find cases in which referral is highlighted. The same procedure can be followed with regard to "Court of Appeal".

Jacintha777 commented 2 years ago

With regard to the filter on the documents: 'judgments' is preferred

Jacintha777 commented 2 years ago

Concerning how the search results should be saved: an excel file containing the name of the case and a sentence before and after the keyword to determine whether the keyword is used in the context of e.g. 'referral to the CCJ'.

Jacintha777 commented 2 years ago

Although I would appreciate it if the scraper could download the pdf document, if that is possible of course.

Jacintha777 commented 2 years ago

With regard to rechtspraak.sr. the search should be conducted in: https://rechtspraak.sr/uitspraken-databank/eenvoudig-zoeken/ The link that you are referring to is a more elaborate search which requires specific information such as case number which makes the search complicated as I do not have that information.

Jacintha777 commented 2 years ago

With regard to the format: the "text" like this?https://rechtspraak.sr/sru-hvj-2020-6/ is fine which is helpful to determine the context in which the keyword is used. But if it is possible to highlight the keyword in the document that would be great (if this is possible of course).

Jacintha777 commented 2 years ago

@hannesdatta, please let me know if you require further information. Thank you

hannesdatta commented 2 years ago

@BilgeKasapoglu , is this something u could handle? I'd say develop it for the first site and we can then check how it performs. Please incest about 2-3 hours for now. MaYbe set up a meeting with Jacinta to clarify any issues.

Woud try beautiful soup first btw. Selenium may be an overkill. Check Tutorials at Odcm.hannesdatta.com for code snippets.

BilgeKasapoglu commented 2 years ago

Dear @hannesdatta,

I keep getting "SSLCertVerificationError" when I try to request the URL. Do you know any experience with such an error? Thank you

Best Bilge

hannesdatta commented 2 years ago

@BilgeKasapoglu, did you try to google this error? This search result seems to be relevant. Let me know please.

https://stackoverflow.com/questions/10667960/python-requests-throwing-sslerror

Jacintha777 commented 2 years ago

@hannesdatta and @BilgeKasapoglu, thanks and curiously following your updates. I am available to meet on 15 and 16 March so let me know.

hannesdatta commented 2 years ago

@BilgeKasapoglu, let us know whether any input is required for working on this.

hannesdatta commented 2 years ago

@BilgeKasapoglu, also inform jacintha about expected date of delivery (plus allow some time for me to review the final product).

BilgeKasapoglu commented 2 years ago

Dear @hannesdatta and @Jacintha777

I think I can work on this on Thursday if it is okay with you. I can can it to you by Friday noon, @hannesdatta. Thank you

Best Bilge

Jacintha777 commented 2 years ago

Dear @hannesdatta and @Jacintha777

I think I can work on this on Thursday if it is okay with you. I can can it to you by Friday noon, @hannesdatta. Thank you

Best Bilge

Jacintha777 commented 2 years ago

Dear @BilgeKasapoglu, that sounds great. I look forward to the results after @hannesdatta has reviewed the final product.
Kind regards, Jacintha

Jacintha777 commented 2 years ago

@hannesdatta and @BilgeKasapoglu, my apologies, I closed this issue by mistake. What I also wanted to comment on: this is for the website of Trinidad and Tobago and I am really pleased to hear from both of you that it can be worked on. I hope you are also successful with the website of Suriname (rechtspraak.sr), which is quite a challenge. Thanks. Kind regards, Jacintha

BilgeKasapoglu commented 2 years ago

Dear @Jacintha777,

would you guide me how to search for the keywords in the second website? Is it through "Zoeken"? Thank you

Best Bilge

BilgeKasapoglu commented 2 years ago

Dear @hannesdatta,

I scraped the first website. Usually, there are less than 20 results. However, "Caribbean Court of Justice" gives 50 results. The scraper only gets the first 20 results. In the past, I came across with such a problem such that the results on separate pages. However, I never understood how to solve it. Would you please help me?

Also, how should I share the code and files with you? For now, I will send them through Microsoft Teams? Thank you

Best Bilge

Jacintha777 commented 2 years ago

Dear Bilge,

Thank you for your message. For the second website you can search via "Zoeken'' but I would recommend via https://rechtspraak.sr/uitspraken-databank/eenvoudig-zoeken/ (see also my comments on GitHub 12 days ago).

For the second website, the keywords to search for are in Dutch namely: verwijzingsprocedure, Herziene Verdrag van Chaguaramas, Caribisch Hof van Justitie. verwijzing naar het Caribisch Hof van Justitie.

Should you require more information please let me know. Thank you and looking forward to the results.

Kind regards, Jacintha

On Tue, Mar 15, 2022 at 10:14 PM Bilge Kasapoğlu @.***> wrote:

Dear @Jacintha777 https://github.com/Jacintha777,

would you guide me how to search for the keywords in the second website? Is it through "Zoeken"? Thank you

Best Bilge

— Reply to this email directly, view it on GitHub https://github.com/tilburgsciencehub/website/issues/303#issuecomment-1068479400, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYB6AXKS6OQOLLKACD5IL2DVAD4TNANCNFSM5P452ABQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

hannesdatta commented 2 years ago

@BilgeKasapoglu - I checked your Teams message. Thanks a bunch for your work!

Please keep the communication / results etc. on GitHub (all project-related communication needs to be here, not somewhere else).

The main goal here is that @Jacintha777 can run the notebook herself. At Tilburg Science Hub, we don't "DO" the job, but we make/give our colleagues tools so they can do it themselves. Plus we share them online.

Accordingly, please:

Let us NOT ship any excel files - Jacinta will have to edit queries in search herself.

Please post your updated notebook here for another round of feedback. Alternatively, post the notebook on gist.github.com.

Old version: scrapeCourts.ipynb.zip

Jacintha777 commented 2 years ago

Dear @hannesdatta, thanks for this update. Please don`t forget that I have zero knowledge on how to use a scraper. Therefore guidance from @BilgeKasapoglu might be necessary. Kind regards, Jacintha

hannesdatta commented 2 years ago

@Jacintha777, no worries. Google Colab has a point&click interface & @BilgeKasapoglu can walk you through how to use it (it's really just clicking on it, changing the search query, and waiting for the results to be downloaded). If you get that to run, it's way more useful for you.

Jacintha777 commented 2 years ago

@hannesdatta, thanks and looking forward to the session with @BilgeKasapoglu. Kind regards, Jacintha

BilgeKasapoglu commented 2 years ago

scrapeCourts.ipynb.zip

@Jacintha777 and @hannesdatta,

Here is the most up-to-date version of the notebook. I also created a version on Google Collab and added you two. I hope you two can see it there now. I must say the second website is difficult to work with because it does not search for the keywords as a whole such that if I search for van huizen, it gives all the results containing "van" and "huizen" even separately. Below, I am putting some additional information for @Jacintha777 to get the class names in each website. Thank you

Best, Bilge

BilgeKasapoglu commented 2 years ago

Dear @Jacintha777 and @hannesdatta,

Below you can find additional information on scraping a website. It is about how to get the class name of the objects that we want to scrape. Let me know if anything is unclear. Thank you

Best Bilge scrapingAdditionalInfo.pdf

Jacintha777 commented 2 years ago

@BilgeKasapoglu and @hannesdatta, thank you very much. Today I will not be able to try out the scraping tool, because of various meetings. Therefore, I will try it out in the weekend. I will let you know how it went on Monday and whether I need a session with @BilgeKasapoglu to guide me via a Teams meeting.

@hannesdatta @BilgeKasapoglu, I expected the complications with the second website, because I tried searching the website rechtspraak.sr as well by entering the keywords separately in : eenvoudig zoeken. So I recognize what @BilgeKasapoglu found with the 'van 'and huizen' words. I also did a similar search with the website of Trinidad & Tobago and there some of the keywords delivered results. So I am really looking forward to the results of the scraping tool

Thanks and will update both of you on Monday. Have a good weekend. Kind regards, Jacintha

Jacintha777 commented 2 years ago

Dear @BilgeKasapoglu, I opened the link with Notepad and from there I had no idea how to proceed. And I have another question: how do I get access to Google Collab? It would be very helpful if you could guide me through it. In this regard, I would appreciate a session via zoom of Teams. Tomorrow I will be at the university and I am not sure if you also work from there. I am available for a session via zoom or Teams on Wednesday 23 or Thursday 24 or Friday 25 March. Please let me know which date and time is convenient for you. Thank you, Jacintha

BilgeKasapoglu commented 2 years ago

Dear @Jacintha777,

Tomorrow i have a meeting with my supervisors at 3PM and I am trying to give them an end result for that meeting. would it be possible to hold the meeting after 16:30 for you? I will be at the university the whole day. Thank you

Best Bilge

Jacintha777 commented 2 years ago

Dear @BilgeKasapoglu, see you tomorrow after 16.30. My office is in the M-building room M312. Good luck with your meeting!

hannesdatta commented 2 years ago

Dear @BilgeKasapoglu, please move the scraper code to a repository where we can actually collaborate on the files. See https://github.com/tilburgsciencehub/onboarding/wiki/Workflow. Any feedback required from me at this stage?

BilgeKasapoglu commented 2 years ago

Dear @hannesdatta,

here is the repo : https://github.com/tilburgsciencehub/courtScraping

Bilge

hannesdatta commented 2 years ago

@BilgeKasapoglu:

@Jacintha777 - please specify a search on the Surinamese site that produces "valid" results.

E.g., we're trying w/ image

But the search results are quite meaningless.

Can you use these results at all?

image

The site just produces results with "van"... not really intended, right?

@Jacintha777 please advise how to go ahead here.

BilgeKasapoglu commented 2 years ago

Dear @hannesdatta ,

@BilgeKasapoglu is my username. I guess you have been @'ing someone else.

Best Bilge

hannesdatta commented 2 years ago

noted ;)

Jacintha777 commented 2 years ago

@hannesdatta and @BilgeKasapoglu, the results with 'van' are indeed useless. I think that the rechtspraak.sr website will not deliver any results. I just tried 'herziene verdrag Chaguaramas' and I get other results not relevant for my research and no results on 'Chaguaramas'. So at least I can say that I tried scraping this website but with no results. many thanks for trying.

@BilgeKasapoglu, with regard to Trinidad and Tobago: do you get the same results of the High Court when searching the Court of Appeal?

BilgeKasapoglu commented 2 years ago

Dear @hannesdatta and @Jacintha777,

I have modified and delivered the outcomes to Jacintha last week. We have concluded that the second website was not so workable. For the first website, I created a python code to scrape it. I shared the file with you on a repository and Google Collaborator.

Best Bilge

Jacintha777 commented 2 years ago

Dear @Hannes Datta @.> and @Bilge Kasapoğlu @.>,

Thank you once again.

@Hannes, I know you are quite busy, but I still would like to enquire about the following: how can the python code find one case and not another? When searching the website of Trinidad and Tobago, the keyword 'referral' delivered the case of Jhamilly Hadeed of 2019. However, there is another case of 2021 which should have been detected when searching for the keyword 'referral'. Do you have an idea or a possible explanation why one case is detected and the other not?

Thank you and looking forward to your reply.

Kind regards, Jacintha

On Fri, Apr 1, 2022 at 4:19 PM Bilge Kasapoğlu @.***> wrote:

Dear @hannesdatta https://github.com/hannesdatta and @Jacintha777 https://github.com/Jacintha777,

I have modified and delivered the outcomes to Jacintha last week. We have concluded that the second website was not so workable. For the first website, I created a python code to scrape it. I shared the file with you on a repository and Google Collaborator.

Best Bilge

— Reply to this email directly, view it on GitHub https://github.com/tilburgsciencehub/website/issues/303#issuecomment-1085962227, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYB6AXKKRNNHU53HZWTCJKLVC4AV5ANCNFSM5P452ABQ . You are receiving this because you were mentioned.Message ID: @.***>

Jacintha777 commented 2 years ago

Dear @hannesdatta and @BilgeKasapoglu,

Thank you once again.

@hannesdatta, I know you are quite busy, but I still would like to enquire about the following: how can the python code find one case and not another? When searching the website of Trinidad and Tobago, the keyword 'referral' delivered the case of Jhamilly Hadeed of 2019. However, there is another case of 2021 which should have been detected when searching for the keyword 'referral'. Do you have an idea or a possible explanation why one case is detected and the other not?

Thank you and looking forward to your reply.

Kind regards, Jacintha

hannesdatta commented 2 years ago

@BilgeKasapoglu, can you comment/have an idea?

BilgeKasapoglu commented 2 years ago

Dear @Jacintha777,

Would you share the exact names of the cases? also, is the search under High Court or Court of Appeal?

Thank you Bilge

BilgeKasapoglu commented 2 years ago

@Jacintha777 also which case should have been showed up in 2021? Thank you

Jacintha777 commented 2 years ago

Dear @BilgeKasapoglu and @hannesdatta ,

Thank you, so if I understand correctly, if the word 'referral' is not mentioned in the title or summary of the case on the website then the scraper will not be able to find it. If that is the case then this answers my question. The conclusion that I draw from this is that the scraper is not able to help me detect cases from the website which mention referral because if they are not in the title and case summary (but in the text somewhere) then I will not be able to find such cases. I had hoped that such would be the case which would make it easier for me. Can this be fixed so that it can search everywhere on the website and not only title and summary only?

Thank you once again.

Kind regards, Jacintha

BilgeKasapoglu commented 2 years ago

Dear @Jacintha777,

To my knowledge, I do not know how to fix it. I think the scraper cannot help with this. Maybe @hannesdatta knows more about this but probably it is the faulty design of the website. Thank you

Best Bilge

hannesdatta commented 2 years ago

Hi all, @Jacintha777, a web scraper captures text that is visible on a website. As the text is not visible on the website, we can't capture it. Our idea was to use the site's search function (right, @BilgeKasapoglu?) to get you the articles, but the search functionality seemed very limited.

My approach would be to use broader search words (which you could change in the Jupyter Notebook) and download all cases, and then use the PDF/text tool we have developed for you earlier to search through this data.

Note that this is a massive undertaking that we can't facilitate at this stage.

What I would say is you try to develop the code from here.

If you want to learn coding in Python, you can also enroll to https://odcm.hannesdatta.com (starting in September) where you will learn python & scraping.