Open qjhart opened 5 years ago
Trying to figure this out and I have a question. The output contains:
"In ENTRY_NAME$name_id != name_output$name_id : longer object length is not a multiple of shorter object length”
I think this shouldn’t happen if you only have one file. ENTRY_NAME$name_id should be equal to name_output$name_id (and it is when I run the file), which means their lengths should definitely be equal. Can you inspect those and see the difference?
I will try to get your test-one.sh
working for me for future troubleshooting.
On Jun 18, 2019, at 3:46 PM, Quinn Hart notifications@github.com wrote:
In ENTRY_NAME$name_id != name_output$name_id : longer object length is not a multiple of shorter object length
Here are the name_ids. ENTRY_NAME has some _data entries in there?
[1] "ENTRY_NAME"
[1] "d7js34-015_1_1" "d7js34-015_1_2" "d7js34-015_1_3"
[4] "d7js34-015_1_4" "d7js34-015_1_5" "d7js34-015_1_6"
[7] "d7js34-015_1_7" "d7js34-015_1_8" "d7js34-015_1_9"
[10] "d7js34-015_1_10" "d7js34-015_1_11" "d7js34-015_1_12"
[13] "d7js34-015_1_13" "d7js34-015_1_14" "d7js34-015_1_15"
[16] "d7js34-015_1_16" "d7js34-015_1_17" "d7js34-015_1_18"
[19] "d7js34-015_1_19" "d7js34-015_1_20" "d7js34-015_1_21"
[22] "d7js34-015_1_22" "d7js34-015_1_23" "d7js34-015_1_24"
[25] "d7js34-015_1_25" "d7js34-015_1_26" "d7js34-015_1_27"
[28] "d7js34-015_1_28" "d7js34-015_1_29" "d7js34-015_1_30"
[31] "d7js34-015_1_31" "d7js34-015_1_32" "d7js34-015_1_33"
[34] "d7js34-015_1_34" "d7js34-015_1_35" "d7js34-015_1_36"
[37] "d7js34-015_2_1" "d7js34-015_2_2" "d7js34-015_2_3"
[40] "d7js34-015_2_4" "d7js34-015_2_5" "d7js34-015_2_6"
[43] "d7js34-015_2_7" "d7js34-015_2_8" "d7js34-015_2_9"
[46] "d7js34-015_2_10" "d7js34-015_2_11" "d7js34-015_2_12"
[49] "d7js34-015_2_13" "d7js34-015_2_14" "d7js34-015_2_15"
[52] "d7js34-015_2_16" "d7js34-015_2_17" "d7js34-015_2_18"
[55] "d7js34-015_2_19" "d7js34-015_2_20" "d7js34-015_2_21"
[58] "d7js34-015_2_22" "d7js34-015_2_23" "d7js34-015_2_24"
[61] "d7js34-015_2_25" "d7js34-015_2_26" "d7js34-015_2_27"
[64] "d7js34-015_2_28" "d7js34-015_2_29" "d7js34-015_2_30"
[67] "d7js34-015_2_31" "d7js34-015_2_32" "d7js34-015_2_33"
[70] "d7js34-015_2_34" "d7js34-015_2_35" "d7js34-015_2_36"
[73] "d7js34-015_2_37" "d7js34-015_2_38" "d7js34-015_2_39"
[76] "d7js34-015_2_40" "d7js34-015_2_41" "d7js34-015_2_42"
[79] "d7js34-015_2_43" "d7js34-015_2_44" "d7js34-015_data1_1_1"
[82] "d7js34-015_data1_1_2" "d7js34-015_data1_0_1" "d7js34-015_data1_0_2"
[1] "name_output"
[1] "d7js34-015_1_1" "d7js34-015_1_2" "d7js34-015_1_3" "d7js34-015_1_4"
[5] "d7js34-015_1_5" "d7js34-015_1_6" "d7js34-015_1_7" "d7js34-015_1_8"
[9] "d7js34-015_1_9" "d7js34-015_1_10" "d7js34-015_1_11" "d7js34-015_1_12"
[13] "d7js34-015_1_13" "d7js34-015_1_14" "d7js34-015_1_15" "d7js34-015_1_16"
[17] "d7js34-015_1_17" "d7js34-015_1_18" "d7js34-015_1_19" "d7js34-015_1_20"
[21] "d7js34-015_1_21" "d7js34-015_1_22" "d7js34-015_1_23" "d7js34-015_1_24"
[25] "d7js34-015_1_25" "d7js34-015_1_26" "d7js34-015_1_27" "d7js34-015_1_28"
[29] "d7js34-015_1_29" "d7js34-015_1_30" "d7js34-015_1_31" "d7js34-015_1_32"
[33] "d7js34-015_1_33" "d7js34-015_1_34" "d7js34-015_1_35" "d7js34-015_1_36"
[37] "d7js34-015_2_1" "d7js34-015_2_2" "d7js34-015_2_3" "d7js34-015_2_4"
[41] "d7js34-015_2_5" "d7js34-015_2_6" "d7js34-015_2_7" "d7js34-015_2_8"
[45] "d7js34-015_2_9" "d7js34-015_2_10" "d7js34-015_2_11" "d7js34-015_2_12"
[49] "d7js34-015_2_13" "d7js34-015_2_14" "d7js34-015_2_15" "d7js34-015_2_16"
[53] "d7js34-015_2_17" "d7js34-015_2_18" "d7js34-015_2_19" "d7js34-015_2_20"
[57] "d7js34-015_2_21" "d7js34-015_2_22" "d7js34-015_2_23" "d7js34-015_2_24"
[61] "d7js34-015_2_25" "d7js34-015_2_26" "d7js34-015_2_27" "d7js34-015_2_28"
[65] "d7js34-015_2_29" "d7js34-015_2_30" "d7js34-015_2_31" "d7js34-015_2_32"
[69] "d7js34-015_2_33" "d7js34-015_2_34" "d7js34-015_2_35" "d7js34-015_2_36"
[73] "d7js34-015_2_37" "d7js34-015_2_38" "d7js34-015_2_39" "d7js34-015_2_40"
[77] "d7js34-015_2_41" "d7js34-015_2_42" "d7js34-015_2_43" "d7js34-015_2_44"
I think this comes from parseFolder not effectively filtering out data1 objects if they're in the same folder as the .RDS output. I just committed an adjustment to to the regex for that which should fix it, but let me know if the _data entries are still there. (https://github.com/ucd-library/wine-price-extraction/commit/4e67aba3b8a109e5ac9538b20ff3a168e62be005#diff-3c565c483b1f64ad72b8e506bf482b1d)
@jcarlen we discussed the fact that the above fix was not a complete fix, here's the results of running this for the same item as above; d7js34-015, It's still the same error as before. I hope you were able to get the test-one.sh
script working
dsi/scripts/test-one.sh d7js34-015
....
Loading required package: ggplot2
[1] "truth.dir=/opt/dsi/Data" "in=d7js34-015/parsed_folder.RDS"
[1] "ENTRY_NAME"
[1] "d7js34-015_1_1" "d7js34-015_1_2" "d7js34-015_1_3" "d7js34-015_1_4"
[5] "d7js34-015_1_5" "d7js34-015_1_6" "d7js34-015_1_7" "d7js34-015_1_8"
[9] "d7js34-015_1_9" "d7js34-015_1_10" "d7js34-015_1_11" "d7js34-015_1_12"
[13] "d7js34-015_1_13" "d7js34-015_1_14" "d7js34-015_1_15" "d7js34-015_1_16"
[17] "d7js34-015_1_17" "d7js34-015_1_18" "d7js34-015_1_19" "d7js34-015_1_20"
[21] "d7js34-015_1_21" "d7js34-015_1_22" "d7js34-015_1_23" "d7js34-015_1_24"
[25] "d7js34-015_1_25" "d7js34-015_1_26" "d7js34-015_1_27" "d7js34-015_1_28"
[29] "d7js34-015_1_29" "d7js34-015_1_30" "d7js34-015_1_31" "d7js34-015_1_32"
[33] "d7js34-015_1_33" "d7js34-015_1_34" "d7js34-015_1_35" "d7js34-015_1_36"
[37] "d7js34-015_2_1" "d7js34-015_2_2" "d7js34-015_2_3" "d7js34-015_2_4"
[41] "d7js34-015_2_5" "d7js34-015_2_6" "d7js34-015_2_7" "d7js34-015_2_8"
[45] "d7js34-015_2_9" "d7js34-015_2_10" "d7js34-015_2_11" "d7js34-015_2_12"
[49] "d7js34-015_2_13" "d7js34-015_2_14" "d7js34-015_2_15" "d7js34-015_2_16"
[53] "d7js34-015_2_17" "d7js34-015_2_18" "d7js34-015_2_19" "d7js34-015_2_20"
[57] "d7js34-015_2_21" "d7js34-015_2_22" "d7js34-015_2_23" "d7js34-015_2_24"
[61] "d7js34-015_2_25" "d7js34-015_2_26" "d7js34-015_2_27" "d7js34-015_2_28"
[65] "d7js34-015_2_29" "d7js34-015_2_30" "d7js34-015_2_31" "d7js34-015_2_32"
[69] "d7js34-015_2_33" "d7js34-015_2_34" "d7js34-015_2_35" "d7js34-015_2_36"
[73] "d7js34-015_2_37" "d7js34-015_2_38" "d7js34-015_2_39" "d7js34-015_2_40"
[77] "d7js34-015_2_41" "d7js34-015_2_42" "d7js34-015_2_43" "d7js34-015_2_44"
[1] "name_output"
[1] "d7js34-015_1_1" "d7js34-015_1_2" "d7js34-015_1_3" "d7js34-015_1_4"
[5] "d7js34-015_1_5" "d7js34-015_1_6" "d7js34-015_1_7" "d7js34-015_1_8"
[9] "d7js34-015_1_9" "d7js34-015_1_10" "d7js34-015_1_11" "d7js34-015_1_12"
[13] "d7js34-015_1_13" "d7js34-015_1_14" "d7js34-015_1_15" "d7js34-015_1_16"
[17] "d7js34-015_1_17" "d7js34-015_1_18" "d7js34-015_1_19" "d7js34-015_1_20"
[21] "d7js34-015_1_21" "d7js34-015_1_22" "d7js34-015_1_23" "d7js34-015_1_24"
[25] "d7js34-015_1_25" "d7js34-015_1_26" "d7js34-015_1_27" "d7js34-015_1_28"
[29] "d7js34-015_1_29" "d7js34-015_1_30" "d7js34-015_1_31" "d7js34-015_1_32"
[33] "d7js34-015_1_33" "d7js34-015_1_34" "d7js34-015_1_35" "d7js34-015_1_36"
[37] "d7js34-015_2_1" "d7js34-015_2_2" "d7js34-015_2_3" "d7js34-015_2_4"
[41] "d7js34-015_2_5" "d7js34-015_2_6" "d7js34-015_2_7" "d7js34-015_2_8"
[45] "d7js34-015_2_9" "d7js34-015_2_10" "d7js34-015_2_11" "d7js34-015_2_12"
[49] "d7js34-015_2_13" "d7js34-015_2_14" "d7js34-015_2_15" "d7js34-015_2_16"
[53] "d7js34-015_2_17" "d7js34-015_2_18" "d7js34-015_2_19" "d7js34-015_2_20"
[57] "d7js34-015_2_21" "d7js34-015_2_22" "d7js34-015_2_23" "d7js34-015_2_24"
[61] "d7js34-015_2_25" "d7js34-015_2_26" "d7js34-015_2_27" "d7js34-015_2_28"
[65] "d7js34-015_2_29" "d7js34-015_2_30" "d7js34-015_2_31" "d7js34-015_2_32"
[69] "d7js34-015_2_33" "d7js34-015_2_34" "d7js34-015_2_35" "d7js34-015_2_36"
[73] "d7js34-015_2_37" "d7js34-015_2_38" "d7js34-015_2_39" "d7js34-015_2_40"
[77] "d7js34-015_2_41" "d7js34-015_2_42" "d7js34-015_2_43" "d7js34-015_2_44"
[1] 0
[1] "dictionary_hits" "dictionary_hits_sim"
Error in data.frame(text = "", confidence = 0, name_id = ENTRY_NAME$name_id[i], :
arguments imply differing number of rows: 1, 0
Calls: lapply -> lapply -> FUN -> data.frame
Execution halted
I'm not able to replicate this problem locally. I think it's caused by ENTRY_NAME$name_id not having an ith entry in some case (where i is between 1 and length(NAME_MATCH)), but I'm not sure beyond that.
I'm also not able to get test-one.sh to run, either locally or with docker. I'm new to docker, so would you or Justin be able to help me troubleshoot my setup?
@jcarlen, hmm for whatever reason, applying the updates for the boxes, seems to have fixed this issue. At least for the example above :)
@jrmerz , if you have a chance to touch base w/ @jcarlen re. getting her docker config set and running the test-one.sh file that would be great.
When running the code one file at a time, you occasionally see this problem. Earlier in the code, the
exclude1
parameter is defined.https://github.com/ucd-library/wine-price-extraction/blob/9c667ecb83c7c6fbdf790cb50c8f820ea4a0f068/dsi/scripts/run_wine_database_one_page.R#L96
Later in the code, this is used to get the text_conf column. https://github.com/ucd-library/wine-price-extraction/blob/9c667ecb83c7c6fbdf790cb50c8f820ea4a0f068/dsi/scripts/run_wine_database_one_page.R#L152
However, not all pages include
text_conf
in the exclude1 parameter. For example, we get a failure with some pages; eg. d7js34-015 ,UCD_Lehmann_3372 where the code is dying at:Warning message: In ENTRY_NAME$name_id != name_output$name_id : longer object length is not a multiple of shorter object length
[1] "dictionary_hits" "dictionary_hits_sim" Error in data.frame(text = "", confidence = 0, name_id = ENTRY_NAME$name_id[i], :
arguments imply differing number of rows: 1, 0 Calls: lapply -> lapply -> FUN -> data.frame Execution halted
Note, if you add the line
Before the NAME_MATCH, then these errors seem to be okay. Not sure if that's the best solution.