langtonhugh / asreview_irr

Code to automatically produce a report from ASReview on inter-rater reliability.
Apache License 2.0
6 stars 2 forks source link

record_ids don't match: how to solve #4

Closed j0sien closed 7 months ago

j0sien commented 1 year ago

Hi there,

I tried out your script, it works very well. However, it turns out that the record_ids differ between me and my colleague. Therefore, I cannot use the results. Is there any automated way to update the record_ids so they match between the .csv files? (e.g. by matching on title?)

langtonhugh commented 1 year ago

Pleased to hear it's working nicely.

If I understand the issue correctly, the fix would involve comparing rater 1 and 2's coding based on something other than record_id. I have added a script to the scripts folder called title_match_example.r which makes the comparison based on the title of the paper, rather than the record_id. This is not yet incorporated into the report, but you can easily copy the code over. I will consider adding it as an option into the report, for sure.

If your record_id variables do not match, does that mean that you loaded in different RIS files into ASReview? In which case, it is possible that the title matching will not be perfect either. But, let me know how you get on.

j0sien commented 1 year ago

disagree1_df <- secon_rec_df %>% filter(included == 1, primary_title %in% first_irrelevant_vec) %>%

select(record_id, authors, year, title, abstract) # For newer projects.

select(record_id, first_authors, publication_year, primary_title, notes_abstract) # For older projects.

Error in filter(): ! Problem while computing ..2 = primary_title %in% first_irrelevant_vec. Caused by error in primary_title %in% first_irrelevant_vec: ! object 'primary_title' not found Run rlang::last_error() to see where the error occurred.

j0sien commented 1 year ago

primary_title issue solved: is now called 'title'

j0sien commented 1 year ago

`rater 1`rater 2

1 n uploaded 11579 11579 2 n reviewed 1364 550 3 % reviewed 11.8 4.75 4 n flagged relevant 125 99 5 n flagged irrelevant 1239 451 6 % flagged relevant 9.16 18 7 n unreviewed 10215 11029 8 n total irrelevant 11454 11480 9 % total relevant 1.08 0.85 10 n relevant v. irrelevant 0 0 11 n relevant v. unreviewed 0 0 > agree(full_array) Percentage agreement (Tolerance=0) Subjects = 11579 Raters = 2 %-agree = 99.4 > kappa2(full_array) Cohen's Kappa for 2 Raters (Weights: unweighted) Subjects = 11579 Raters = 2 Kappa = 0.703 z = 76.1 p-value = 0
j0sien commented 1 year ago

It seems to go wrong at comparing the two. I've checked manually and know that

<html xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

Bo à Josien | 0 | 1 -- | -- | -- 0 | 11436 | 20 1 | 44 of which 35 unseen | 79

langtonhugh commented 10 months ago

Sorry for the delay - only just returned to this recently. I think this issue has now been solved (see #6 ). Let me know if you still identify a problem. Thanks a lot.

langtonhugh commented 7 months ago

I think recent updates have resolved these issues, so closing for now.