nmfs-fish-tools / DisMAP

The Distribution Mapping and Analysis Portal is a national web application which displays spatial distribution data for over 400 marine fish and macroinvertebrates
https://apps-st.fisheries.noaa.gov/dismap/index.html
0 stars 2 forks source link

Update Alaska bottom trawl data download and data compile #4

Closed Melissa-Karp closed 2 months ago

Melissa-Karp commented 3 months ago

When the API is ready and @Melissa-Karp is ready with the Rscript I will need @EmilyMarkowitz-NOAA to modify AK code for downloading (download_ak.R file) and merging the various data files within the Compile_Dismap_Current.R file. Please fork this repository and submit a pull request when complete.

Task List

EmilyMarkowitz-NOAA commented 3 months ago

Thanks for this!

✔️ I've shown you how to do pull requests (#3 and #2 ) and some issue management (https://github.com/nmfs-fish-tools/DisMAP/issues/4).

⌚ Let me know when the scripts are ready and I'll get on it :) In the meantime, I have written a zero-filling script for our FOSS data that we may be able to apply to your code, which is provided as an example in the AFSC GAP team's production data documentation. It will look something like this:

# Load data
library(dplyr)
library(here)
library(readr)
catch <- readr::read_csv(file = here::here("data/gap_products_foss_catch.csv"))[,-1] # remove "row number" column
haul <- readr::read_csv(file = here::here("data/gap_products_foss_haul.csv"))[,-1] # remove "row number" column
species <- readr::read_csv(file = here::here("data/gap_products_foss_species.csv"))[,-1] # remove "row number" column

# come up with full combination of what species should be listed for what hauls/surveys
# for zero-filled data, all species caught in a survey need to have zero or non-zero row entries for a haul
comb <- dplyr::full_join(
  x = dplyr::left_join(catch, haul, by = "HAULJOIN") %>%
    dplyr::select(SURVEY_DEFINITION_ID, SPECIES_CODE) %>%
    dplyr::distinct(),
  y = haul %>%
    dplyr::select(SURVEY_DEFINITION_ID, HAULJOIN) %>%
    dplyr::distinct(), 
  by = "SURVEY_DEFINITION_ID", 
  relationship = "many-to-many"
)

# Join data to make a full zero-filled CPUE dataset
dat <- comb %>% 
  # add species data to unique species by survey table
  dplyr::left_join(species, "SPECIES_CODE") %>% 
  # add catch data
  dplyr::full_join(catch, c("SPECIES_CODE", "HAULJOIN")) %>% 
  # add haul data
  dplyr::full_join(haul) %>% # , c("SURVEY_DEFINITION_ID", "HAULJOIN")
  # modify zero-filled rows
  dplyr::mutate(
    CPUE_KGKM2 = ifelse(is.null(CPUE_KGKM2), 0, CPUE_KGKM2),
    CPUE_KGHA = CPUE_KGKM2/100, # Hectares
    CPUE_NOKM2 = ifelse(is.null(CPUE_NOKM2), 0, CPUE_NOKM2),
    CPUE_NOHA = CPUE_NOKM2/100, # Hectares
    COUNT = ifelse(is.null(COUNT), 0, COUNT),
    WEIGHT_KG = ifelse(is.null(WEIGHT_KG), 0, WEIGHT_KG) ) 

Where each table contains the following columns and dimensions. See a preview of these tables here:

Catch image

Haul image image

Species image

Melissa-Karp commented 3 months ago

Hi Em. I have updated the code on github so its ready for you to start making the necessary edits for the AK data. Ping me if you have any questions about the code. Thanks a bunch!

EmilyMarkowitz-NOAA commented 3 months ago

Excellent! I am still waiting for the FOSS API to get up and running, and will jump on this as soon as it is available. If we get to crunch time, I have a backup plan where I will use the files downloaded in this google folder. Just so I can plan, when is the latest this needs to be completed?

Melissa-Karp commented 3 months ago

Hi Em, I think right now the priority would be working on any needed edits in the Compile script so I can at least run the data update using the downloaded data files, and then we can work on the API download script once the issues get sorted out. So if you can make those edits separate from the API script, I'd do that first. Our goal is to have the data updates online by the end of May.

On Tue, Apr 23, 2024 at 4:57 PM Em Markowitz (NOAA) < @.***> wrote:

Excellent! I am still waiting for the FOSS API to get up and running, and will jump on this as soon as it is available. If we get to crunch time, I have a backup plan where I will use the files downloaded in this google folder https://drive.google.com/drive/folders/1NcDCxolMf-drd01vy0_NIhqD1lf3r_Ud?usp=drive_link. Just so I can plan, when is the latest this needs to be completed?

— Reply to this email directly, view it on GitHub https://github.com/nmfs-fish-tools/DisMAP/issues/4#issuecomment-2073430942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOV5GXACTJ7VQAL7UWFVZILY63DLBAVCNFSM6AAAAABGOAMAPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZTGQZTAOJUGI . You are receiving this because you authored the thread.Message ID: @.***>

-- Melissa A. Karp

Fish Biologist, Office of Science & Technology NOAA Fisheries | U.S. Department of Commerce

Office: (301) 427-8202

www.fisheries.noaa.gov

Melissa-Karp commented 3 months ago

So in terms of latest to have the script updated by would be May 13th.

On Tue, Apr 23, 2024 at 5:00 PM Melissa Karp - NOAA Federal < @.***> wrote:

Hi Em, I think right now the priority would be working on any needed edits in the Compile script so I can at least run the data update using the downloaded data files, and then we can work on the API download script once the issues get sorted out. So if you can make those edits separate from the API script, I'd do that first. Our goal is to have the data updates online by the end of May.

On Tue, Apr 23, 2024 at 4:57 PM Em Markowitz (NOAA) < @.***> wrote:

Excellent! I am still waiting for the FOSS API to get up and running, and will jump on this as soon as it is available. If we get to crunch time, I have a backup plan where I will use the files downloaded in this google folder https://drive.google.com/drive/folders/1NcDCxolMf-drd01vy0_NIhqD1lf3r_Ud?usp=drive_link. Just so I can plan, when is the latest this needs to be completed?

— Reply to this email directly, view it on GitHub https://github.com/nmfs-fish-tools/DisMAP/issues/4#issuecomment-2073430942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOV5GXACTJ7VQAL7UWFVZILY63DLBAVCNFSM6AAAAABGOAMAPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZTGQZTAOJUGI . You are receiving this because you authored the thread.Message ID: @.***>

-- Melissa A. Karp

Fish Biologist, Office of Science & Technology NOAA Fisheries | U.S. Department of Commerce

Office: (301) 427-8202

www.fisheries.noaa.gov

-- Melissa A. Karp

Fish Biologist, Office of Science & Technology NOAA Fisheries | U.S. Department of Commerce

Office: (301) 427-8202

www.fisheries.noaa.gov

EmilyMarkowitz-NOAA commented 3 months ago

Sounds like a plan! I'll work on a download-from-oracle approach and then will add an API approach if the API is ready in time :) I'll let you know if I have any questions as I dig into this 👷‍♀️

Melissa-Karp commented 3 months ago

So just to clarify I won't be able to download from oracle, so I just meant ignore the direct download from anywhere issues for now, and just make the necessary changes to the Compile script to clean/standardize the data into the DisMAP format using the data you can manually download from FOSS. We can deal with how to make download via Rscript possible later. So the focus is on the Complie script for now, not the download script. Sorry for the confusion.

On Tue, Apr 23, 2024 at 5:08 PM Em Markowitz (NOAA) < @.***> wrote:

Sounds like a plan! I'll work on a download-from-oracle approach and then will add an API approach if the API is ready in time :) I'll let you know if I have any questions as I dig into this 👷‍♀️

— Reply to this email directly, view it on GitHub https://github.com/nmfs-fish-tools/DisMAP/issues/4#issuecomment-2073447092, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOV5GXGGO5DP5JD2BWGZYGTY63EVRAVCNFSM6AAAAABGOAMAPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZTGQ2DOMBZGI . You are receiving this because you authored the thread.Message ID: @.***>

-- Melissa A. Karp

Fish Biologist, Office of Science & Technology NOAA Fisheries | U.S. Department of Commerce

Office: (301) 427-8202

www.fisheries.noaa.gov

Melissa-Karp commented 2 months ago

Hi Em. I incorporated your edits into the download_ak and compile script offline, but pushed those changes to github so they are visible in this repo now. Thanks for the help.