tmatta / lsasim

Simulate large scale assessment data
6 stars 5 forks source link

example gives warning #48

Closed pdbailey0 closed 1 year ago

pdbailey0 commented 1 year ago

Hi,

I ran your example in item_sim and it gave an odd warning. This is from the version 2.1.2

item_gen(b_bounds = c(-2, 2), a_bounds = c(.75, 1.25),
thresholds = c(1, 2, 3), n_1pl = c(5, 5, 5), n_2pl = c(0, 0, 5))
# Warning message:
# In rbind(c(0, 0, 0), c(0, 0, 0), c(0, 0, 0), c(0, 0, 0), c(0, 0,  :
#   number of columns of result is not a multiple of vector length (arg 6)
wleoncio commented 1 year ago

Hi Paul,

Thank you for reporting. I can confirm the issue, will investigate and get back to you soon.

wleoncio commented 1 year ago

So the issue itself looks easy to explain: near the end of the function, item_gen() tries to use rbind() on vectors of different sizes. This seems to happen whenever the thresholds argument contains a vector of values that are not a multiple of one another. For example, thresholds = c(1, 3) and c(2, 4, 8) would work because of how R recycles vectors, whereas c(2, 4, 6) (and the c(1, 2, 3) from the example) won't.

I'm not familiar with the idea of thresholds in this context, but I expected the d* columns in the output should equal the number of thresholds for that particular item (filling with 0.00 for the missing thresholds). Consider the example below:

item_gen(b_bounds = c(-1, 1), thresholds = c(1, 2, 4), n_1pl = c(1, 1, 1))

Depending on RNGesus' will, your result should be something like:

  item     b    d1    d2    d3   d4 a c k p
1    1 -0.46  0.00  0.00  0.00 0.00 1 0 1 1
2    2  0.07 -0.14  0.14 -0.14 0.14 1 0 2 1
3    3  0.66 -0.59 -0.01  0.17 0.43 1 0 4 1

Two things I don't understand:

  1. Why does item 1 has d1 = d2 = 0? If that item has 1 threshold, shouldn't it have a non-zero value for d1?
  2. Item 2 has d1 = d3 and d2 = d4 due to vector recycling. Shouldn't it have 4 different threshold values?

The fix could be as simple as padding the vectors with zeros and generating extra d-values where needed. I would like to get some confirmation that that's the correct solution before proceeding, though.

mollyolaf commented 1 year ago

Hi Waldir!

What is the error? Otherwise, the first item won't have thresholds (d_j = 0) since just 1 is specified and that is captured in b.

Leslie

On Fri, Mar 24, 2023 at 4:22 AM Waldir Leoncio @.***> wrote:

So the issue itself looks easy to explain: near the end of the function, item_gen() tries to use rbind() on vectors of different sizes. This seems to happen whenever the thresholds argument contains a vector of values that are not a multiple of one another. For example, thresholds = c(1, 3) and c(2, 4, 8) would work because of how R recycles vectors, whereas c(2, 4, 6) (and the c(1, 2, 3) from the example) won't.

I'm not familiar with the idea of thresholds in this context, but I suppose the d output should equal the number of thresholds for that particular item (filling with 0.00 for the missing thresholds). Consider the example below:

item_gen(b_bounds = c(-1, 1), thresholds = c(1, 2, 4), n_1pl = c(1, 1, 1))

Depending on RNGesus' will, your result should be something like:

item b d1 d2 d3 d4 a c k p 1 1 -0.46 0.00 0.00 0.00 0.00 1 0 1 1 2 2 0.07 -0.14 0.14 -0.14 0.14 1 0 2 1 3 3 0.66 -0.59 -0.01 0.17 0.43 1 0 4 1

Two things I don't understand:

  1. Why does item 1 has d1 = d2 = 0? If that item has 1 threshold, shouldn't it have a non-zero value for d1?
  2. Item 2 has d1 = d3 and d2 = d4 due to vector recycling. Shouldn't it have 4 different threshold values?

The fix could be as simple as padding the vectors with zeros and generating extra d-values where needed. I would like to get some confirmation that that's the correct solution before proceeding, though.

— Reply to this email directly, view it on GitHub https://github.com/tmatta/lsasim/issues/48#issuecomment-1482421752, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSQOEFPBS4DG7RJXAOT5GDW5VKUFANCNFSM6AAAAAAWF3LFOU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

wleoncio commented 1 year ago

Hi Leslie,

The error message only presents itself if item_gen() is asked to generate items with different numbers of thresholds that are not multiples of each other. So if I understood correctly, in the example above:

  1. Item 1 should not have any threshold values (which is correct)
  2. Item 3 should have 4 unique threshold values (which is also correct)

What about item 2? Shouldn't it only have values for d1 and d2?

mollyolaf commented 1 year ago

Yes, all correct. And item 2 should have values for only d1 and d2. Not sure if you noticed, but they are in fact repeats for d3 and d4: -0.14 0.14 -0.14 0.14.

So, looks like for whatever reason, the generating process is just recycling thresholds even though there should be just 2 of them. The "k" specification is even correct for item 2 where k=2.

Leslie

On Mon, Mar 27, 2023 at 8:58 AM Waldir Leoncio @.***> wrote:

Hi Leslie,

The error message only presents itself if item_gen() is asked to generate items with different numbers of thresholds that are not multiples of each other. So if I understood correctly, in the example above:

  1. Item 1 should not have any threshold values (which is correct)
  2. Item 3 should have 4 unique threshold values (which is also correct)

What about item 2? Shouldn't it only have values for d1 and d2?

— Reply to this email directly, view it on GitHub https://github.com/tmatta/lsasim/issues/48#issuecomment-1485066998, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSQOEGIPBJATYWJVT65OZLW6GFG7ANCNFSM6AAAAAAWF3LFOU . You are receiving this because you commented.Message ID: @.***>

wleoncio commented 1 year ago

Thank you for the clarification, @mollyolaf! :)

I noticed the vector recycling, the explanation for the behavior is in my comment above, hope it's clear enough.

This week got a bit rushed with another package I'm trying to get into CRAN before I leave for vacation on Friday, but I think I can also find time to fix this. Worst case, I'll tacke it right after Easter break. :)

wleoncio commented 1 year ago

Good news: got some free time here and pushed a solution. Technically, we just pad the missing threshold values with zeros so the matrix is complete (while also preventing R from recycling values). See code here for more details.

Now I'm working on some unrelated fixes (it's no longer recommended to use class() inside if-statements); as soon as this is done, I'll submit the fix to CRAN.

wleoncio commented 1 year ago

So lsasim version 2.1.3 containing the fix to this issue is on its way to CRAN, so I'll close this as resolved. Thanks again for your contribution, please let me know if the issue persists.