Closed amuthan-sakthivel closed 1 year ago
@npatki - Thanks for answering. We want to basically create synthetic test data by taking references from the prod database. Each collection has a different schema and it seems we might have to spend lot of efforts for these 2 activities,
Hi @amuthan-sakthivel, my pleasure. From my experience helping other users, this is not too much work as long as you have an understanding of what the SDV library expects. There are also some functions available to you for convienence.
We're happy to help here too. Do you have an example of different collections and how their data schemas are different?
Hi @amuthan-sakthivel, do you have anything further to discuss around this topic?
Since this issue has been inactive for a few weeks, I'm closing it off as answered. Please feel free to reply if there are any follow-ups. I can always reopen the issue.
Hi @amuthan-sakthivel, nice to meet you. I am not an expert in MongoDB but I know that you can make the SDV work for most schemaless databases.
The basic requirement is that your data points have similar fields, and that the fields should be basic ones such as numbers, strings, etc. (SDV does not currently handle rich media such as audio or images).
As long as those conditions are met, you should be able to express the data in a tabular format that the SDV accepts. As a simple example, consider that your records may be in a format like this:
Then it can be converted to a tabular format such as:
The SDV accepts this format and creates synthetic data for it. (Notice how if some fields are missing, you always mark them as
None
in the table.)Let me know if that answers your question!
Notes: