Sampling - Githubissues

whyzjuhit commented 11 months ago

Hi, I'm using rvine fuction to generate samples. And I find that the samples are generated radomly. But before sampling, actually I have some data already(sampled by other method), so they are fixed. I want to based on these data and then use vine method to generated the other correlated variables. I think I need to do a rosenblatt tranformation or something. I guess:

I can use vine function fit the observations.
I can use rosenblatt fuction to convert the data generated by other method to pseudo_obs
based on the fitted vine and the pseudo_obs, I can use inverse_rosenblatt funtion to sample Is it right? What else should I do?

tnagler commented 11 months ago

Yes, you probably need the rosenblatt transform and it's inverse. But I'm not quite sure how the already existing data comes into play here. Are these data for a subset of variables? In that case, you want to simulate conditionally on the existing variables. Section 4.2 in this article does this for example.

whyzjuhit commented 11 months ago

Yes, I want to do a conditional sampling. These days I tried with rvinecopulib. I found the pseudo_obs of a matrix x is different from what rosenblatt fuction gives. It seems I can't solve it with rvinecopulib.

tnagler commented 11 months ago

Indeed the pseudo_obs and rosenblatt functions do different things. The first converts to uniform variables without changing the dependence, the second to independent uniform variables. For conditional sampling you need the second.

whyzjuhit commented 8 months ago

Thank you Professor Nagler. I've read the paper and found it difficult for me. I'll keep on working on it. Thank you again.

vande02 commented 6 months ago

To follow up on this conversation, Section 4.2 of the attached paper specifies that, to conditionally sample using known values for a subset of variables, the other, unknown variables should be set to arbitrary [0,1] values before applying the Rosenblatt transform (step 2). While these (Rosenblatt transformed) arbitrary values are replaced by independent uniformly distributed values in Step 3, the arbitrary values from Step 2 will influence the Rosenblatt transformed values of the known variables (I believe), which in turn will influence the value of the unknown variables when applying the inverse Rosenblatt transform in Step 6. So, what arbitrary values should be used for the unknown variables in Step 2? Thank you in advance for clarification.

tnagler commented 6 months ago

The values are really arbitrary and should not influence the transform of the other variables if the structure is set up correctly. In particular, the variables you condition on have to be on the top right of the diagonal. As a sanity check you can use NA as the 'arbitrary value' and see if the transform comes out with NAs at the wrong place.

HEY745 commented 5 months ago

Sorry to bother you again, Professor Nagler. Could you clarify about"the variables you condition on have to be on the top right of the diagonal"? I try to understand it from two aspects. On the one hand, variable data should be placed on the top right of the diagonal(step 2), but according to the article, variable data should be placed like that. On the other hand, it may the matrix structure of Vine Copula. But I found that the Vine Copula is the same whether it is the upper or lower triangle.

tnagler commented 5 months ago

This is a different package (VineCopula instead of rvinecopulib) which uses a different convention for the structure matrix. Here, the variables you condition on have to be bottom right on the diagonal. Please ask questions about VineCopula in the respective repository.

vinecopulib / rvinecopulib

Sampling #269