Open jmelot opened 9 months ago
Hi! Could you explain what you mean by "manually review?" Are you looking for someone to look for relationships between the outputs?
Hello! The goal of this issue is to help surface errors in our code that links software to the organizations of its contributors. What we're looking for here is for someone to:
software_to_ror.csv
and, for each of the methods described in our README such as url_matches
, ner_text_extraction
, and so on, counting the number of times each ROR id was returned by that method. You can find the ROR ids in the ror_id
column and the methods in the extraction_methods
columnurl_matches
method, you would randomly select 20 rows from software_to_ror.csv
where Google's ROR id was in the ror_id
column and url_matches
was in the extraction_methods
column. You would then manually review the repo to see if you could find evidence that someone who worked at Google also worked on that piece of software. It would be a good idea to filter the rows you select so that only rows with the github_slug
column are populated for easier reviewror_id
, github_slug
, extraction_methods
, is_valid
, and notes
, with the first three columns taken from software_to_ror.csv
, is_valid
true if you were able to find evidence linking the organization to the software and false otherwise, and the notes
column populated with notes on your findings. This csv could be placed in a new validation
directory at the top level of this project.Are you interested in working on this? If so let me know if you need any help getting set up with the first step, need more details on how to review software repos for evidence that someone at an organization works on that software, or have any further questions.
This may help us identify common spurious linkages. If you would like to work on this but need help getting started, please comment on this issue!