liberu-genealogy / php-dna

DNA processing and manipulating for PHP 8.3
https://www.liberu.co.uk
MIT License
31 stars 23 forks source link

Sweep: Refactor generally to improve quality the file visualization.php and maintainbility plus readable by following psr 1, psr 2, psr 12 standards #147

Closed curtisdelicata closed 5 months ago

curtisdelicata commented 5 months ago
Checklist - [X] Modify `src/Snps/VariedicInherit.php` ✓ https://github.com/liberu-genealogy/php-dna/commit/f4afb22a3ce1353f8d1dbcc9b79139a1c79c643d [Edit](https://github.com/liberu-genealogy/php-dna/edit/sweep/refactor_generally_to_improve_quality_th_79eb7/src/Snps/VariedicInherit.php) - [X] Modify `src/Snps/VariedicInherit.php` ✓ https://github.com/liberu-genealogy/php-dna/commit/f4afb22a3ce1353f8d1dbcc9b79139a1c79c643d [Edit](https://github.com/liberu-genealogy/php-dna/edit/sweep/refactor_generally_to_improve_quality_th_79eb7/src/Snps/VariedicInherit.php) - [X] Modify `src/Snps/VariedicInherit.php` ✓ https://github.com/liberu-genealogy/php-dna/commit/f4afb22a3ce1353f8d1dbcc9b79139a1c79c643d [Edit](https://github.com/liberu-genealogy/php-dna/edit/sweep/refactor_generally_to_improve_quality_th_79eb7/src/Snps/VariedicInherit.php) - [X] Modify `src/Snps/VariedicInherit.php` ✓ https://github.com/liberu-genealogy/php-dna/commit/f4afb22a3ce1353f8d1dbcc9b79139a1c79c643d [Edit](https://github.com/liberu-genealogy/php-dna/edit/sweep/refactor_generally_to_improve_quality_th_79eb7/src/Snps/VariedicInherit.php) - [X] Modify `src/Snps/VariedicInherit.php` ✓ https://github.com/liberu-genealogy/php-dna/commit/f4afb22a3ce1353f8d1dbcc9b79139a1c79c643d [Edit](https://github.com/liberu-genealogy/php-dna/edit/sweep/refactor_generally_to_improve_quality_th_79eb7/src/Snps/VariedicInherit.php) - [X] Modify `src/Snps/VariedicInherit.php` ✓ https://github.com/liberu-genealogy/php-dna/commit/f4afb22a3ce1353f8d1dbcc9b79139a1c79c643d [Edit](https://github.com/liberu-genealogy/php-dna/edit/sweep/refactor_generally_to_improve_quality_th_79eb7/src/Snps/VariedicInherit.php) - [X] Modify `src/Snps/VariedicInherit.php` ✓ https://github.com/liberu-genealogy/php-dna/commit/f4afb22a3ce1353f8d1dbcc9b79139a1c79c643d [Edit](https://github.com/liberu-genealogy/php-dna/edit/sweep/refactor_generally_to_improve_quality_th_79eb7/src/Snps/VariedicInherit.php)
sweep-ai[bot] commented 5 months ago

🚀 Here's the PR! #160

See Sweep's progress at the progress dashboard!
💎 Sweep Pro: I'm using GPT-4. You have unlimited GPT-4 tickets. (tracking ID: 4cf2e13800)

[!TIP] I can email you next time I complete a pull request if you set up your email here!


Actions (click)


Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/liberu-genealogy/php-dna/blob/bdab1c6675a0959db6fb9523790b2d2f757ecf08/src/Snps/VariedicInherit.php#L1-L447 https://github.com/liberu-genealogy/php-dna/blob/bdab1c6675a0959db6fb9523790b2d2f757ecf08/src/Snps/Utils.php#L1-L196

Step 2: ⌨️ Coding

Refactor the VariedicInherit class in the src/Snps/VariedicInherit.php file to adhere to PSR-1, PSR-2, and PSR-12 coding standards.

Update with the necessary changes:

Update block with the necessary changes:

Update the __construct() method in the VariedicInherit class to follow PSR standards.

Update with the necessary changes:

/** * @param array $config : scan config */ public function __construct(array $config) { $this->config = $config; $required = [self::KEY_CALLBACK, self::KEY_REMOVED, self::KEY_MAGIC, self::KEY_RESOURCE]; foreach ($required as $key) { if (!isset($this->config[$key])) { $message = sprintf(self::ERR_MISSING_KEY, $key); throw new InvalidArgumentException($message); } } }

Update block with the necessary changes:

/** * VariadicInherit constructor. * * @param array $config Scan configuration * @throws InvalidArgumentException If a required configuration key is missing */ public function __construct(array $config) { $this->config = $config; $required = [ self::KEY_CALLBACK, self::KEY_REMOVED, self::KEY_MAGIC, self::KEY_RESOURCE, ]; foreach ($required as $key) { if (!isset($this->config[$key])) { $message = sprintf(self::ERR_MISSING_KEY, $key); throw new InvalidArgumentException($message); } } }

Update the getFileContents() method in the VariedicInherit class to follow PSR standards.

Update with the necessary changes:

/** * Grabs contents * Initializes messages to [] * Converts "\r" and "\n" to ' ' * * @param string $fn : name of file to scan * @return string $name : classnames */ public function getFileContents(string $fn) : string { if (!file_exists($fn)) { $this->contents = ''; throw new InvalidArgumentException(sprintf(self::ERR_FILE_NOT_FOUND, $fn)); } $this->clearMessages(); $this->contents = file_get_contents($fn); $this->contents = str_replace(["\r","\n"],['', ' '], $this->contents); return $this->contents; }

Update block with the necessary changes:

/** * Get the contents of a file. * * @param string $filePath Path to the file to scan * @return string The file contents with line breaks replaced by spaces * @throws InvalidArgumentException If the file is not found */ public function getFileContents(string $filePath): string { if (!file_exists($filePath)) { $this->contents = ''; throw new InvalidArgumentException( sprintf(self::ERR_FILE_NOT_FOUND, $filePath) ); } $this->clearMessages(); $this->contents = file_get_contents($filePath); $this->contents = str_replace(["\r", "\n"], ['', ' '], $this->contents); return $this->contents; }

Update the getKeyValue() method in the VariedicInherit class to follow PSR standards.

Update with the necessary changes:

public static function getKeyValue(string $contents, string $key, string $delim) { $pos = strpos($contents, $key); if ($pos === FALSE) return ''; $end = strpos($contents, $delim, $pos + strlen($key) + 1); $key = substr($contents, $pos + strlen($key), $end - $pos - strlen($key)); if (is_string($key)) { $key = trim($key); } else { $key = ''; } $key = trim($key); return $key; }

Update block with the necessary changes:

/** * Get the value of a key from a string. * * @param string $contents The string to search * @param string $key The key to search for * @param string $delimiter The delimiter to use * @return string The value of the key, or an empty string if not found */ public static function getKeyValue( string $contents, string $key, string $delimiter ): string { $position = strpos($contents, $key); if ($position === false) { return ''; } $end = strpos($contents, $delimiter, $position + strlen($key) + 1); $value = substr( $contents, $position + strlen($key), $end - $position - strlen($key) ); return is_string($value) ? trim($value) : ''; }

Refactor the VariedicInherit class in the src/Snps/VariedicInherit.php file to adhere to PSR-1, PSR-2, and PSR-12 coding standards.

Update with the necessary changes:

Update block with the necessary changes:

Update the homozygous_snps(), is_valid(), predict_ancestry(), getPredictions(), and maxPop() methods in the VariedicInherit class to follow PSR standards.

Update with the necessary changes:

public function homozygous_snps(string $chrom = "") { trigger_error("This method has been renamed to `homozygous`.", E_USER_DEPRECATED); return $this->homozygous($chrom); } public function is_valid() { trigger_error("This method has been renamed to `valid` and is now a property.", E_USER_DEPRECATED); return $this->valid; } public function predict_ancestry( ?string $output_directory = null, bool $write_predictions = false, ?string $models_directory = null, ?string $aisnps_directory = null, ?int $n_components = null, ?int $k = null, ?string $thousand_genomes_directory = null, ?string $samples_directory = null, ?string $algorithm = null, ?string $aisnps_set = null ) { // Method implementation goes here } public function getPredictions( $output_directory, $write_predictions, $models_directory, $aisnps_directory, $n_components, $k, $thousand_genomes_directory, $samples_directory, $algorithm, $aisnps_set ) { if (!$this->valid) { // If the object is not valid, return an empty array return []; } // Check if ezancestry package is installed if (!class_exists('ezancestry\commands\Predict')) { // Throw an exception if the ezancestry package is not installed throw new Exception('Ancestry prediction requires the ezancestry package; please install it'); } $predict = new ezancestry\commands\Predict(); // Call the predict method of the ezancestry\commands\Predict class $predictions = $predict->predict( $this->snps, $output_directory, $write_predictions, $models_directory, $aisnps_directory, $n_components, $k, $thousand_genomes_directory, $samples_directory, $algorithm, $aisnps_set ); // Get the maxPop values from the first prediction $maxPopValues = $this->maxPop($predictions[0]); // Add the predictions to the maxPopValues array $maxPopValues['ezancestry_df'] = $predictions; // Return the maxPopValues array return $maxPopValues; } private function maxPop($row) { // Extract the values from the $row array $popcode = $row['predicted_population_population']; $popdesc = $row['population_description']; $poppct = $row[$popcode]; $superpopcode = $row['predicted_population_superpopulation']; $superpopdesc = $row['superpopulation_name']; $superpoppct = $row[$superpopcode]; // Return an array with the extracted values return [ 'population_code' => $popcode, 'population_description' => $popdesc, '_percent' => $poppct, 'superpopulation_code' => $superpopcode, 'superpopulation_description' => $superpopdesc, 'population_percent' => $superpoppct, ]; }

Update block with the necessary changes:

/** * Get homozygous SNPs for a given chromosome. * * @param string $chromosome The chromosome to get homozygous SNPs for * @return mixed The result of the homozygous() method * @deprecated Use the homozygous() method instead */ public function homozygous_snps(string $chromosome = '') { trigger_error( 'This method has been renamed to `homozygous`.', E_USER_DEPRECATED ); return $this->homozygous($chromosome); } /** * Check if the object is valid. * * @return bool The value of the "valid" property * @deprecated Use the "valid" property instead */ public function is_valid(): bool { trigger_error( 'This method has been renamed to `valid` and is now a property.', E_USER_DEPRECATED ); return $this->valid; } /** * Predict ancestry using the ezancestry package. * * @param string|null $outputDirectory The output directory for predictions * @param bool $writePredictions Whether to write the predictions to files * @param string|null $modelsDirectory The directory containing the models * @param string|null $aisnpsDirectory The directory containing the AIsnps * @param int|null $nComponents The number of components for the model * @param int|null $k The number of nearest neighbors to use * @param string|null $thousandGenomesDirectory The directory containing the 1000 Genomes data * @param string|null $samplesDirectory The directory containing the samples * @param string|null $algorithm The algorithm to use for prediction * @param string|null $aisnpsSet The set of AIsnps to use * @return array The predicted ancestry values * @throws Exception If the ezancestry package is not installed */ public function predict_ancestry( ?string $outputDirectory = null, bool $writePredictions = false, ?string $modelsDirectory = null, ?string $aisnpsDirectory = null, ?int $nComponents = null, ?int $k = null, ?string $thousandGenomesDirectory = null, ?string $samplesDirectory = null, ?string $algorithm = null, ?string $aisnpsSet = null ): array { return $this->getPredictions( $outputDirectory, $writePredictions, $modelsDirectory, $aisnpsDirectory, $nComponents, $k, $thousandGenomesDirectory, $samplesDirectory, $algorithm, $aisnpsSet ); } /** * Get ancestry predictions using the ezancestry package. * * @param string|null $outputDirectory The output directory for predictions * @param bool $writePredictions Whether to write the predictions to files * @param string|null $modelsDirectory The directory containing the models * @param string|null $aisnpsDirectory The directory containing the AIsnps * @param int|null $nComponents The number of components for the model * @param int|null $k The number of nearest neighbors to use * @param string|null $thousandGenomesDirectory The directory containing the 1000 Genomes data * @param string|null $samplesDirectory The directory containing the samples * @param string|null $algorithm The algorithm to use for prediction * @param string|null $aisnpsSet The set of AIsnps to use * @return array The predicted ancestry values * @throws Exception If the ezancestry package is not installed or the object is not valid */ public function getPredictions( ?string $outputDirectory = null, bool $writePredictions = false, ?string $modelsDirectory = null, ?string $aisnpsDirectory = null, ?int $nComponents = null, ?int $k = null, ?string $thousandGenomesDirectory = null, ?string $samplesDirectory = null, ?string $algorithm = null, ?string $aisnpsSet = null ): array { if (!$this->valid) { return []; } if (!class_exists('ezancestry\commands\Predict')) { throw new Exception( 'Ancestry prediction requires the ezancestry package; please install it' ); } $predict = new ezancestry\commands\Predict(); $predictions = $predict->predict( $this->snps, $outputDirectory, $writePredictions, $modelsDirectory, $aisnpsDirectory, $nComponents, $k, $thousandGenomesDirectory, $samplesDirectory, $algorithm, $aisnpsSet ); $maxPopValues = $this->maxPop($predictions[0]); $maxPopValues['ezancestry_df'] = $predictions; return $maxPopValues; } /** * Get the maximum population values from a prediction row. * * @param array $row The prediction row * @return array The maximum population values */ private function maxPop(array $row): array { $populationCode = $row['predicted_population_population']; $populationDescription = $row['population_description']; $populationPercent = $row[$populationCode]; $superpopulationCode = $row['predicted_population_superpopulation']; $superpopulationDescription = $row['superpopulation_name']; $superpopulationPercent = $row[$superpopulationCode]; return [ 'population_code' => $populationCode, 'population_description' => $populationDescription, '_percent' => $populationPercent, 'superpopulation_code' => $superpopulationCode, 'superpopulation_description' => $superpopulationDescription, 'population_percent' => $superpopulationPercent, ]; }

Update the compute_cluster_overlap() method in the VariedicInherit class to follow PSR standards.

Update with the necessary changes:

/** * Computes cluster overlap based on given threshold. * * @param float $cluster_overlap_threshold The threshold for cluster overlap. * @return DataFrame The computed cluster overlap DataFrame. */ public function compute_cluster_overlap($cluster_overlap_threshold = 0.95) { // Sample data for cluster overlap computation $data = [ "cluster_id" => ["c1", "c3", "c4", "c5", "v5"], "company_composition" => [ "23andMe-v4", "AncestryDNA-v1, FTDNA, MyHeritage", "23andMe-v3", "AncestryDNA-v2", "23andMe-v5, LivingDNA", ], "chip_base_deduced" => [ "HTS iSelect HD", "OmniExpress", "OmniExpress plus", "OmniExpress plus", "Illumina GSAs", ], "snps_in_cluster" => array_fill(0, 5, 0), "snps_in_common" => array_fill(0, 5, 0), ]; // Create a DataFrame from the data and set "cluster_id" as the index $df = new DataFrame($data); $df->setIndex("cluster_id"); $to_remap = null; if ($this->build != 37) { // Create a clone of the current object for remapping $to_remap = clone $this; $to_remap->remap(37); // clusters are relative to Build 37 $self_snps = $to_remap->snps()->select(["chrom", "pos"])->dropDuplicates(); } else { $self_snps = $this->snps()->select(["chrom", "pos"])->dropDuplicates(); } // Retrieve chip clusters from resources $chip_clusters = $this->resources->get_chip_clusters(); // Iterate over each cluster in the DataFrame foreach ($df->indexValues() as $cluster) { // Filter chip clusters based on the current cluster $cluster_snps = $chip_clusters->filter(function ($row) use ($cluster) { return strpos($row["clusters"], $cluster) !== false; })->select(["chrom", "pos"]); // Update the DataFrame with the number of SNPs in the cluster and in common with the current object $df->loc[$cluster]["snps_in_cluster"] = count($cluster_snps); $df->loc[$cluster]["snps_in_common"] = count($self_snps->merge($cluster_snps, "inner")); // Calculate overlap ratios for cluster and self $df["overlap_with_cluster"] = $df["snps_in_common"] / $df["snps_in_cluster"]; $df["overlap_with_self"] = $df["snps_in_common"] / count($self_snps); // Find the cluster with the maximum overlap $max_overlap = array_keys($df["overlap_with_cluster"], max($df["overlap_with_cluster"]))[0]; // Check if the maximum overlap exceeds the threshold for both cluster and self if ( $df["overlap_with_cluster"][$max_overlap] > $cluster_overlap_threshold && $df["overlap_with_self"][$max_overlap] > $cluster_overlap_threshold ) { // Update the current object's cluster and chip based on the maximum overlap $this->cluster = $max_overlap; $this->chip = $df["chip_base_deduced"][$max_overlap]; $company_composition = $df["company_composition"][$max_overlap]; // Check if the current object's source is present in the company composition if (strpos($company_composition, $this->source) !== false) { if ($this->source === "23andMe" || $this->source === "AncestryDNA") { // Extract the chip version from the company composition $i = strpos($company_composition, "v"); $this->chip_version = substr($company_composition, $i, $i + 2); } } else { // Log a warning about the SNPs data source not found } } } // Return the computed cluster overlap DataFrame return $df; } }

Update block with the necessary changes:

/** * Compute cluster overlap based on a given threshold. * * @param float $clusterOverlapThreshold The threshold for cluster overlap * @return DataFrame The computed cluster overlap DataFrame */ public function computeClusterOverlap(float $clusterOverlapThreshold = 0.95): DataFrame { $data = [ 'cluster_id' => ['c1', 'c3', 'c4', 'c5', 'v5'], 'company_composition' => [ '23andMe-v4', 'AncestryDNA-v1, FTDNA, MyHeritage', '23andMe-v3', 'AncestryDNA-v2', '23andMe-v5, LivingDNA', ], 'chip_base_deduced' => [ 'HTS iSelect HD', 'OmniExpress', 'OmniExpress plus', 'OmniExpress plus', 'Illumina GSAs', ], 'snps_in_cluster' => array_fill(0, 5, 0), 'snps_in_common' => array_fill(0, 5, 0), ]; $df = new DataFrame($data); $df->setIndex('cluster_id'); $toRemap = null; if ($this->build !== 37) { $toRemap = clone $this; $toRemap->remap(37); $selfSnps = $toRemap->snps()->select(['chrom', 'pos'])->dropDuplicates(); } else { $selfSnps = $this->snps()->select(['chrom', 'pos'])->dropDuplicates(); } $chipClusters = $this->resources->getChipClusters(); foreach ($df->indexValues() as $cluster) { $clusterSnps = $chipClusters->filter( function ($row) use ($cluster) { return strpos($row['clusters'], $cluster) !== false; } )->select(['chrom', 'pos']); $df->loc[$cluster]['snps_in_cluster'] = count($clusterSnps); $df->loc[$cluster]['snps_in_common'] = count($selfSnps->merge($clusterSnps, 'inner')); $df['overlap_with_cluster'] = $df['snps_in_common'] / $df['snps_in_cluster']; $df['overlap_with_self'] = $df['snps_in_common'] / count($selfSnps); $maxOverlap = array_keys($df['overlap_with_cluster'], max($df['overlap_with_cluster']))[0]; if ( $df['overlap_with_cluster'][$maxOverlap] > $clusterOverlapThreshold && $df['overlap_with_self'][$maxOverlap] > $clusterOverlapThreshold ) { $this->cluster = $maxOverlap; $this->chip = $df['chip_base_deduced'][$maxOverlap]; $companyComposition = $df['company_composition'][$maxOverlap]; if (strpos($companyComposition, $this->source) !== false) { if ($this->source === '23andMe' || $this->source === 'AncestryDNA') { $i = strpos($companyComposition, 'v'); $this->chip_version = substr($companyComposition, $i, $i + 2); } } else { // Log a warning about the SNPs data source not found } } } return $df; } }


Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/refactor_generally_to_improve_quality_th_79eb7.


🎉 Latest improvements to Sweep:
  • New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
  • Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
  • Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To recreate the pull request edit the issue title or description. Something wrong? Let us know.

This is an automated message generated by Sweep AI.