synthetichealth / synthea

Synthetic Patient Population Simulator
https://synthetichealth.github.io/synthea
Apache License 2.0
2.12k stars 637 forks source link

how to add new names #908

Open vbchinnam opened 3 years ago

vbchinnam commented 3 years ago

Hello, I am trying to add new names. I have 1000 names and surnames which needs to be added. and i need to generate 10k patients. how to provide the names to randomize? Do I need to change the language and ethnicity also? If it is mandatory please let me know how to change it or add new language and ethnicity.

Appreciate your response and time

Thankyou

jawalonoski commented 3 years ago

Replace or edit the src/main/resources/names.yml file -- it is that straight forward.

Just replace the names under spanish and english if you want to override the names without regard for language.

If you want to add names for a new language without overwriting the current names (i.e. not Spanish or English), just add a language section (e.g. french or chinese or whatever language you want to add), and then you need to edit the src/main/java/org/mitre/synthea/world/concepts/Names.java file, since it does not use them automatically.

citizenrich commented 3 years ago

Hi. I've tried out adding new names this morning and modified the names.yml and Names.java files. The generation outputs almost entirely English names, despite there being the same number of names in all languages (elements from the periodic table translated into all UN languages). Is there a tweak I need to make to evenly pick out names from each language?

Edits to Names.java in case I'm making a mistake (likely)... pastebin of names.yml: https://pastebin.com/K18R3rG2

  public static String fakeFirstName(String gender, String language, Person person) {
    List<String> choices;
    if ("spanish".equalsIgnoreCase(language)) {
      choices = (List<String>) names.get("spanish." + gender);
    } else if ("french".equalsIgnoreCase(language)) {
      choices = (List<String>) names.get("french." + gender);
    } else if ("arabic".equalsIgnoreCase(language)) {
      choices = (List<String>) names.get("arabic." + gender);
    } else if ("chinese".equalsIgnoreCase(language)) {
      choices = (List<String>) names.get("chinese." + gender);
    } else if ("russian".equalsIgnoreCase(language)) {
      choices = (List<String>) names.get("russian." + gender);
    } else {
      choices = (List<String>) names.get("english." + gender);
    }
  public static String fakeLastName(String language, Person person) {
    List<String> choices;
    if ("spanish".equalsIgnoreCase(language)) {
      choices = (List<String>) names.get("spanish.family");
    } else if ("french".equalsIgnoreCase(language)) {
      choices = (List<String>) names.get("french.family");
    } else if ("arabic".equalsIgnoreCase(language)) {
      choices = (List<String>) names.get("arabic.family");
    } else if ("chinese".equalsIgnoreCase(language)) {
      choices = (List<String>) names.get("chinese.family");
    } else if ("russian".equalsIgnoreCase(language)) {
      choices = (List<String>) names.get("russian.family");
    } else {
      choices = (List<String>) names.get("english.family");
    }
jawalonoski commented 3 years ago

If you want all names to have equal probability, the easiest solution is to just shove all the names under english.

What you have done is fine. However, the reason that all names do not have equal probability, is that the number of patients who speak a foreign language as their primary language is a significant minority. If you want to edit that, you need to edit the primary language code (honestly, this should probably be in a configuration file somewhere):

https://github.com/synthetichealth/synthea/blob/6ed19abf8ac7870a2c59b9fd734fc72bd32837e5/src/main/java/org/mitre/synthea/world/geography/Demographics.java#L130-L231

citizenrich commented 3 years ago

Thanks @jawalonoski I think I understand. When using Other Areas is the class languageFromRaceAndEthnicity still used? If so, is there a way to override it when using international or other locations?

jawalonoski commented 3 years ago

Thanks @jawalonoski I think I understand. When using Other Areas is the class languageFromRaceAndEthnicity still used? If so, is there a way to override it when using international or other locations?

Yes, it is still used. No, there is no way to currently override that except through code. As I said though, it really should be in a configuration file. We'd be happy to take that as a pull request if you or anyone else wants to make that contribution.

vbchinnam commented 3 years ago

The generated patients are having the numbers attached to them. Is it possible to generate the patient names without that numbers? image

jawalonoski commented 3 years ago

Edit the following property to false:

# If true, person names have numbers appended to them to make them more obviously fake
generate.append_numbers_to_person_names = true

That being said, I do not recommend that you do this, since the numbers are a good indicator that these people are fake.

vbchinnam commented 3 years ago

got it. Thankyou for suggesting me. Much appreciated.

citizenrich commented 3 years ago

I hope a small follow-up is ok. For names, it looks like the project must be rebuilt after replacing name.yml. Is that also true for demographics/Other Areas, and is there some gradle/Java trick to not rebuild but still load the new files for testing?

jawalonoski commented 3 years ago

Gradle has a feature called compile avoidance, so it only rebuilds the things that have changed.

If you use the ./run_synthea command it should just pickup the new files (yaml or other configuration settings) without rebuilding everything.