oobianom / quickcode

An R package made out of mine and Brice's scrapbook of much needed functions.
https://quickcode.obi.obianom.com
Other
5 stars 0 forks source link

New feature: add various equations #16

Open oobianom opened 10 months ago

oobianom commented 10 months ago

quickcode currently offers geometric mean, sd and cv calculations. So it will be good to expand this area of the package to include a lot more frequently used equations across various disciplines.

Ideas of equations are very much welcome.

brichard1638 commented 10 months ago

This is a very timely request!

I actually am very interested in sharing an equation that should be added to the quickcode package.

I have recently conducted an initial search for this equation in R on the web but was unable to find the exact equation offered in this recommendation.

PROPOSED FUNCTION NAME: pairdist WHAT IT DOES: Using multivariate data, gets the distance of points from the center of a cluster INPUT REQUIREMENT: Passes either a data frame or a matrix object of numeric variables only - no qualitative or binary data EQUATION: sqrt(rowSums((DS- matrix(colMeans(DS), n, v, byrow = TRUE))^2)) EQUATION EXPLANATION:

CONSTRAINT: The user should not be able to pass non-contiguous observations to the n argument; this argument should be designed to pass observations beginning with 1 to n where n is either equal to 1 or equal to or less than the total number of observations ALTERNATIVE FUNCTION STRUCTURE: To simplify the pairdist function, remove the arguments n and v from the function; this redefined functionality would process all observations and all variables passed to the function FUNCTION STRUCTURE WITH N & V: pairdist(data, n, v) BASIC FUNCTION STRUCTURE: pairdist(data) OUTPUT: The function returns a Named Vector consisting of a row number and a pair-distance value FUNCTION UTILITY: Used to generate the computations needed to model pair-distance measures in three dimensions X-REFERENCE: See p. 15 of the svgViewR package, version 1.4.3, where the equation can be found and verified

oobianom commented 10 months ago

Cool. I will work on including this. By "See p. 15 of the svgViewR package, version 1.4.3", are you saying the equation already exists in that package? I did check but didn't see it in "https://cran.r-project.org/web/packages/svgViewR/svgViewR.pdf"

brichard1638 commented 10 months ago

Obi:::

I apologize for the confusion. That reference verifies the equation itself. It's not a separate function provided in that package. It, however, should be an independent function which is why I suggested it.

The layout of the equation for use in the development of a new function can be found as an example on p. 15 of the svgViewR package.

Brice


From: Obi Obianom @.> Sent: Friday, January 26, 2024 11:16 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

Cool. I will work on including this. By "See p. 15 of the svgViewR package, version 1.4.3", are you saying the equation already exists in that package? I did check but didn't see it in "https://cran.r-project.org/web/packages/svgViewR/svgViewR.pdf"

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1912974559, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UI7P3ICRRT45FBJJP3YQR5QJAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJSHE3TINJVHE. You are receiving this because you commented.Message ID: @.***>

oobianom commented 10 months ago

Ok I get it now. I started the function and have a rough draft.

I just looked up equations in physics, found this: https://physics.info/equations/ For pharmacometrics, here: https://pharmacy.ufl.edu/files/2013/01/5127-28-equations.pdf Also, several other major equations across fields. What do think? good idea to make individual function to calculate them as well? At least that's my original intent.

brichard1638 commented 10 months ago

Obi:::

The documents you provided on equations are rather comprehensive. Certainly, one could develop an R package devoted solely to equations. A quick search of CRAN revealed that there are (64) packages with the word "equation" in either the package name or its title description.

My recommendation would be that if you are going to provide equations in your quickcode package, ensure that the documentation showcases at least two fundamental data requirements:

For example, the pair distance function I recommended should incorporate a code snippet that cross-references the svgViewR package in which a multivariate dataset is converted into a 3D model of data points.

What this does in my opinion is optimize the understanding of the context in which an equation can be used. Context matters if you want R users to apply your functions, especially equations which quite frankly, can be rather abstract since they often are not tied to any contextual utility. For example e = mc2 means nothing to most people. But, if you were to tie that formula to a code snippet along with a plot it tells a story which now directly anchors learning to the equation.

This for me became a rather powerful idea when I was studying octonians, using them to determine the amount of thrust needed to push an object into a parabolic distribution path.

Brice


From: Obi Obianom @.> Sent: Saturday, January 27, 2024 1:50 AM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

Ok I get it now. I started the function and have a rough draft.

I just looked up equations in physics, found this: https://physics.info/equations/ For pharmacometrics, here: https://pharmacy.ufl.edu/files/2013/01/5127-28-equations.pdf Also, several other major equations across fields. What do think? good idea to make individual function to calculate them as well? At least that's my original intent.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1913039337, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UONWKNFBV52BNQ3JZDYQSPSJAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGAZTSMZTG4. You are receiving this because you commented.Message ID: @.***>

oobianom commented 10 months ago

Really great analysis and recommendation there, Brice. I fully agree with you. It does make sense to provide a context around the equation.

Let see what I put together based on the link I provided, maybe just a few more used ones for now. And then we will see how that looks.

oobianom commented 9 months ago

Hey Brice, after thinking more on the above proposal for the comprehensive list of equations, I am reverting to your initial response. It could be better to develop a package of just equations or rather to pick only key ones across disciplines. For this reason, I am postponing implementing that list till maybe v0.8

brichard1638 commented 9 months ago

Obi:::

Understood! Sounds good.


From: Obi Obianom @.> Sent: Tuesday, February 6, 2024 1:34 AM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

Hey Brice, after thinking more on the above proposal for the comprehensive list of equations, I am reverting to your initial response. It could be better to develop a package of just equations or rather to pick only key ones across disciplines. For this reason, I am postponing implementing that list till maybe v0.8

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1928870803, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UJ22SP4IKF2OKYDOLDYSHFGXAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYHA3TAOBQGM. You are receiving this because you commented.Message ID: @.***>

oobianom commented 9 months ago

Finalizing the pairDist function now

brichard1638 commented 9 months ago

Awesome!


From: Obi Obianom @.> Sent: Sunday, February 18, 2024 4:01 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

Finalizing the pairDist function now

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1951443849, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UIC3KEDQHTNDBPL23DYUJT3FAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGQ2DGOBUHE. You are receiving this because you commented.Message ID: @.***>

oobianom commented 9 months ago

Uploaded updates. Brice, can you test the pairDist function and let me know if there is something missing.

brichard1638 commented 9 months ago

I installed the latest GitHub version of quickcode and was unable to test the pairDist function. It was not contained in that version. Please advise.


From: Obi Obianom @.> Sent: Sunday, February 18, 2024 4:18 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

Uploaded updates. Brice, can you test the pairDist function and let me know if there is something missing.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1951447765, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UPHBTH7E7X2OXQ6RCLYUJVY5AVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGQ2DONZWGU. You are receiving this because you commented.Message ID: @.***>

oobianom commented 9 months ago

This is surprising. Can you try again now. It is currently in the namescape. https://github.com/oobianom/quickcode/blob/main/NAMESPACE

try to restart R with .rs.restartR() after installing, maybe that will help.

brichard1638 commented 9 months ago

My error. I had the wrong namespace in the install line. Stand by for testing results...


From: Obi Obianom @.> Sent: Sunday, February 18, 2024 4:39 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

This is surprising. Can you try again now. It is currently in the namescape. https://github.com/oobianom/quickcode/blob/main/NAMESPACE

try to restart R with .rs.restartR() after installing, maybe that will help.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1951453288, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5ULRE2DF5MJIVO7Z3SDYUJYKLAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGQ2TGMRYHA. You are receiving this because you commented.Message ID: @.***>

brichard1638 commented 9 months ago

The following issues were noted during the testing of the pairDist function, which was testing on version 0.7 of quickcode:

* The following error was generated when two of the four possible columns were selected in a targeted dataset:

 head(pairDist(data = dpiz, n = 917, v = c(2,3)))
     1         2         5         6         7         9
 1.4406731 1.7001983 1.1369441 3.7624087 3.2435808 0.4790598
 Warning message:
 In matrix(colMeans(data), n, v, byrow = TRUE) :
 data length [4] is not a sub-multiple or multiple of the number of rows [917]

* In the documentation for the pairDist function, under the Arguments section, the words "atleast " are incorrect and should be separated with a space

Recommendations:

There are two recommendations offered relative to the utility of the pairDist function. The first recommendation is that in its current form, only data frames are allowed to be passed to the function's data argument. It is strongly recommended that data objects of class matrix/array also be included. The error cited in the first bullet is believed to originate with the inclusion of the v argument. Including this argument may be more of a challenge to correctly configure than is necessary. There are two positions prompting whether or not the use of this argument should be a user-based option provided in the function moving forward: The first position suggests that the v argument should be removed on the basis that the onus to provide a complete dataset for use with the pairDist function lies on the user; this suggests that if only 3 variables are needed for the distance calculation, then only a dataset that contains those 3 variables should be wholly contained in the passed dataset - no more, no less. The argument could also be made that keeping the v argument would make it extremely difficult to determine which numeric variables in a more expanded dataset, for example 50 variables, were used to compute the pair distance. Passing a single dataset in which all variables are applied makes the computation clear as to which variables were used. The second position argues that the v argument should be kept as it maximizes the utility of the pairDist function. However, it may take much more code to resolve the error currently found not to mention the possibility that additional errors may be discovered during regression testing. * It is believed that the original calculation I provided you for the pairDist function used the v argument as a sub-argument of the matrix function which was used to verify the existing number of columns passed from the dataset. It appears, and I could be wrong here, that the v argument was not designed to be used to select which columns, contiguous or non-contiguous, to use, only that the total number of columns passed in the original dataset was the value to be entered. If this is true, then this argument will need to be reworked within the function if it is to be kept.

I hope these comments make sense. If not, let me know and I will provide additional guidance on the issue here.

P.S. If you were to ask me whether the v argument should be an option used to select contiguous or non-contiguous columns from a passed dataset, my gut tells me that while it may make the function easier to use, in developing a process model, it would be very difficult to discern which variables were used in the computation. This is in contradistinction to passing a single dataset where the presumption is that all variables were used because there was no option provided to the user to allow whether variable combinations were allowed.

I will defer to you on this one. I believe that there are strong arguments to be made on both sides as to whether or not to keep a programmatic option available that allows the user to select which combination of variables to use in a pair distance computation, but whatever option you select is fine with me. I do not hold a firm position on this issue one way or the other.


From: Brice Richard @.> Sent: Sunday, February 18, 2024 4:45 PM To: oobianom/quickcode @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

My error. I had the wrong namespace in the install line. Stand by for testing results...


From: Obi Obianom @.> Sent: Sunday, February 18, 2024 4:39 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

This is surprising. Can you try again now. It is currently in the namescape. https://github.com/oobianom/quickcode/blob/main/NAMESPACE

try to restart R with .rs.restartR() after installing, maybe that will help.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1951453288, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5ULRE2DF5MJIVO7Z3SDYUJYKLAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGQ2TGMRYHA. You are receiving this because you commented.Message ID: @.***>

oobianom commented 9 months ago

Thanks for testing the function and the constructive feedback.

To be honest, I didn't know about this calculation until you suggested it. And honestly, I still don't know too much about it. However, I believe it will be useful for users in the particular field that needs it.

So on that note and based on your suggestion above, I re-worked the function so that v is no longer used. In the future, if needed, we can find a way to bring it back and make sure it works.

As for the documentation, I updated it as well.

For your text "It is strongly recommended that data objects of class matrix/array also be included.", I am currently not restricting the input to only data frames, so this should be fine.

I updated the function, you may take a look.

brichard1638 commented 9 months ago

Obi:::

Let me briefly describe the significance of that function. I would argue that the impact that function has cannot be underestimated. It is the basis for being able to take multivariate data and creating calculations that can be converted into 3-dimensional data profiles.

So, as an example, imagine you want to visually see what the distance metric looks like in three dimensions for air pressure, temperature, humidity, and small particulates of matter as a function of a reading. These readings say, are taken every 10 minutes for 24 hours. What you can do is create that distance measurement offered by your pairDist function, to generate a three dimensional profile of particulate matter for a specific day. What makes this model so interesting is that it provides a visual profile of what an average distance metric looks like using critical multivariate data.

Here's the benefit: imagine now that you can compare multiple three dimensional profiles of data both visually and statistically. This capability I believe offers unprecedented potential in both technology and the sciences.

I've also used the pairDist algorithm to showcase what the stock market pricing generally looks like from a cauchy distribution. In my opinion, it ideally explains the boom and bust cycle in three dimensions! It was remarkable to model the stock market through that lens.

In juxtaposition to this metric, cluster modeling as a form of machine learning uses a nearly identical algorithm to generate statistically significant cluster groups on unsupervised data. I am currently in the process of developing an advanced reproducible process model for cluster model development and predictability using k-means clustering - but not in a way that can be found anywhere on the internet.

This is because I have access to the most cutting-edge technologies that have been developed in this space over the last 5 years, but which is not found online.

Now, let's fast forward to pharmacokinetics. While I'm not an expert in this space, I would argue that one could better understand rates of medicinal absorption in the body by studying the boundaries of medicinal absorption. How can this be done? Well, through cluster modeling. Once you have an optimized number of clusters defining the dataset, you can begin to start to study the clusters relative to absorption rates.

I'm sure there are plenty of algorithms that can be used to study this, but if you want to be able to predict absorption rates for example, one way to do this would be through a deep understanding of boundaries computed through distance metrics used as the basis for capturing statistically significant cluster groups.

The more I study statistics and machine learning modeling, the more I'm convinced that every living process on Earth is largely operationally maintained by cycles and thresholds. Cluster modeling, in my opinion, allows one to take a deep statistical dive into the peripheries driving these operational thresholds.

By the way, three-dimensional modeling can also be effectively used as a marketing tool because it is basically an interactive profile map that anyone can conceptually understand. It should not be difficult to secure funding if you are using these models.

Now, you are involved at J&J in new drug discovery development. I assure you there are untapped opportunities available in using unsupervised datasets with cluster modeling to better understand and identify new drug discovery. If you aren't using cluster modeling you certainly should.

I apologize for the divergence of information from your original content. I will test the updated version of the pairDist function and respond accordingly.  


From: Obi Obianom @.> Sent: Sunday, February 18, 2024 9:31 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

Thanks for testing the function and the constructive feedback.

To be honest, I didn't know about this calculation until you suggested it. And honestly, I still don't know too much about it. However, I believe it will be useful for users in the particular field that needs it.

So on that note and based on your suggestion above, I re-worked the function so that v is no longer used. In the future, if needed, we can find a way to bring it back and make sure it works.

As for the documentation, I updated it as well.

For your text "It is strongly recommended that data objects of class matrix/array also be included.", I am currently not restricting the input to only data frames, so this should be fine.

I updated the function, you may take a look.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1951589461, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UI2HBCHACJMVXRDWALYUK2PJAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGU4DSNBWGE. You are receiving this because you commented.Message ID: @.***>

brichard1638 commented 9 months ago

I tested the pairDist function on version 0.7 version of quickcode.

The following error is still being returned:

Warning message: In matrix(colMeans(data), n, byrow = TRUE) : data length [4] is not a sub-multiple or multiple of the number of rows [917]

The reason the error is being generated is because the matrix function requires the total number of rows to be divisible by the total number of columns. So, in the error example above, 917 is not divisible by 4. However, if the n argument was changed to 916, the error drops.

The matrix function is causing the issue here. The function should be modified so that the row sums and column means can be executed without throwing an error.


From: Obi Obianom @.> Sent: Sunday, February 18, 2024 9:31 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

Thanks for testing the function and the constructive feedback.

To be honest, I didn't know about this calculation until you suggested it. And honestly, I still don't know too much about it. However, I believe it will be useful for users in the particular field that needs it.

So on that note and based on your suggestion above, I re-worked the function so that v is no longer used. In the future, if needed, we can find a way to bring it back and make sure it works.

As for the documentation, I updated it as well.

For your text "It is strongly recommended that data objects of class matrix/array also be included.", I am currently not restricting the input to only data frames, so this should be fine.

I updated the function, you may take a look.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1951589461, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UI2HBCHACJMVXRDWALYUK2PJAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGU4DSNBWGE. You are receiving this because you commented.Message ID: @.***>

oobianom commented 9 months ago

Hmm okay. I just updated the function so that it checks the n argument before proceeding. What do you think of this solution?

oobianom commented 9 months ago

Obi::: Let me briefly describe the significance of that function. I would argue that the impact that function has cannot be underestimated. It is the basis for being able to take multivariate data and creating calculations that can be converted into 3-dimensional data profiles. So, as an example, imagine you want to visually see what the distance metric looks like in three dimensions for air pressure, temperature, humidity, and small particulates of matter as a function of a reading. These readings say, are taken every 10 minutes for 24 hours. What you can do is create that distance measurement offered by your pairDist function, to generate a three dimensional profile of particulate matter for a specific day. What makes this model so interesting is that it provides a visual profile of what an average distance metric looks like using critical multivariate data. Here's the benefit: imagine now that you can compare multiple three dimensional profiles of data both visually and statistically. This capability I believe offers unprecedented potential in both technology and the sciences. I've also used the pairDist algorithm to showcase what the stock market pricing generally looks like from a cauchy distribution. In my opinion, it ideally explains the boom and bust cycle in three dimensions! It was remarkable to model the stock market through that lens. In juxtaposition to this metric, cluster modeling as a form of machine learning uses a nearly identical algorithm to generate statistically significant cluster groups on unsupervised data. I am currently in the process of developing an advanced reproducible process model for cluster model development and predictability using k-means clustering - but not in a way that can be found anywhere on the internet. This is because I have access to the most cutting-edge technologies that have been developed in this space over the last 5 years, but which is not found online. Now, let's fast forward to pharmacokinetics. While I'm not an expert in this space, I would argue that one could better understand rates of medicinal absorption in the body by studying the boundaries of medicinal absorption. How can this be done? Well, through cluster modeling. Once you have an optimized number of clusters defining the dataset, you can begin to start to study the clusters relative to absorption rates. I'm sure there are plenty of algorithms that can be used to study this, but if you want to be able to predict absorption rates for example, one way to do this would be through a deep understanding of boundaries computed through distance metrics used as the basis for capturing statistically significant cluster groups. The more I study statistics and machine learning modeling, the more I'm convinced that every living process on Earth is largely operationally maintained by cycles and thresholds. Cluster modeling, in my opinion, allows one to take a deep statistical dive into the peripheries driving these operational thresholds. By the way, three-dimensional modeling can also be effectively used as a marketing tool because it is basically an interactive profile map that anyone can conceptually understand. It should not be difficult to secure funding if you are using these models. Now, you are involved at J&J in new drug discovery development. I assure you there are untapped opportunities available in using unsupervised datasets with cluster modeling to better understand and identify new drug discovery. If you aren't using cluster modeling you certainly should. I apologize for the divergence of information from your original content. I will test the updated version of the pairDist function and respond accordingly.   ____ From: Obi Obianom @.> Sent: Sunday, February 18, 2024 9:31 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16) Thanks for testing the function and the constructive feedback. To be honest, I didn't know about this calculation until you suggested it. And honestly, I still don't know too much about it. However, I believe it will be useful for users in the particular field that needs it. So on that note and based on your suggestion above, I re-worked the function so that v is no longer used. In the future, if needed, we can find a way to bring it back and make sure it works. As for the documentation, I updated it as well. For your text "It is strongly recommended that data objects of class matrix/array also be included.", I am currently not restricting the input to only data frames, so this should be fine. I updated the function, you may take a look. — Reply to this email directly, view it on GitHub<#16 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UI2HBCHACJMVXRDWALYUK2PJAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGU4DSNBWGE. You are receiving this because you commented.Message ID: @.***>

I appreciate the explanations. Thanks.

brichard1638 commented 9 months ago

I tested the latest changes made to the quickcode package version 0.7

With the latest changes made to the pairDist function, the scope of the errors increased. What follows are error examples that were returned during testing:

Error 1:

pairDist(data = dpiz_scale, n = 916) Error in pairDist(data = dpiz_scale, n = 916) : n must be a multiple of data length e.g. n = 3668 or 7336 etc

Error 2: pairDist(data = dpiz_scale, n = 917) Error in data - matrix(colMeans(data), n, byrow = TRUE) : non-conformable arrays In addition: Warning message: In matrix(colMeans(data), n, byrow = TRUE) : data length [4] is not a sub-multiple or multiple of the number of rows [917]

This is the same error that has been returned over the last few cycles. To mitigate this error, I believe you will have to modify the pairDist function such that the calculations can be accurately processed without using the matrix function.


From: Obi Obianom @.> Sent: Monday, February 19, 2024 2:06 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

Obi::: Let me briefly describe the significance of that function. I would argue that the impact that function has cannot be underestimated. It is the basis for being able to take multivariate data and creating calculations that can be converted into 3-dimensional data profiles. So, as an example, imagine you want to visually see what the distance metric looks like in three dimensions for air pressure, temperature, humidity, and small particulates of matter as a function of a reading. These readings say, are taken every 10 minutes for 24 hours. What you can do is create that distance measurement offered by your pairDist function, to generate a three dimensional profile of particulate matter for a specific day. What makes this model so interesting is that it provides a visual profile of what an average distance metric looks like using critical multivariate data. Here's the benefit: imagine now that you can compare multiple three dimensional profiles of data both visually and statistically. This capability I believe offers unprecedented potential in both technology and the sciences. I've also used the pairDist algorithm to showcase what the stock market pricing generally looks like from a cauchy distribution. In my opinion, it ideally explains the boom and bust cycle in three dimensions! It was remarkable to model the stock market through that lens. In juxtaposition to this metric, cluster modeling as a form of machine learning uses a nearly identical algorithm to generate statistically significant cluster groups on unsupervised data. I am currently in the process of developing an advanced reproducible process model for cluster model development and predictability using k-means clustering - but not in a way that can be found anywhere on the internet. This is because I have access to the most cutting-edge technologies that have been developed in this space over the last 5 years, but which is not found online. Now, let's fast forward to pharmacokinetics. While I'm not an expert in this space, I would argue that one could better understand rates of medicinal absorption in the body by studying the boundaries of medicinal absorption. How can this be done? Well, through cluster modeling. Once you have an optimized number of clusters defining the dataset, you can begin to start to study the clusters relative to absorption rates. I'm sure there are plenty of algorithms that can be used to study this, but if you want to be able to predict absorption rates for example, one way to do this would be through a deep understanding of boundaries computed through distance metrics used as the basis for capturing statistically significant cluster groups. The more I study statistics and machine learning modeling, the more I'm convinced that every living process on Earth is largely operationally maintained by cycles and thresholds. Cluster modeling, in my opinion, allows one to take a deep statistical dive into the peripheries driving these operational thresholds. By the way, three-dimensional modeling can also be effectively used as a marketing tool because it is basically an interactive profile map that anyone can conceptually understand. It should not be difficult to secure funding if you are using these models. Now, you are involved at J&J in new drug discovery development. I assure you there are untapped opportunities available in using unsupervised datasets with cluster modeling to better understand and identify new drug discovery. If you aren't using cluster modeling you certainly should. I apologize for the divergence of information from your original content. I will test the updated version of the pairDist function and respond accordingly.   … ____ From: Obi Obianom @.> Sent: Sunday, February 18, 2024 9:31 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16https://github.com/oobianom/quickcode/issues/16) Thanks for testing the function and the constructive feedback. To be honest, I didn't know about this calculation until you suggested it. And honestly, I still don't know too much about it. However, I believe it will be useful for users in the particular field that needs it. So on that note and based on your suggestion above, I re-worked the function so that v is no longer used. In the future, if needed, we can find a way to bring it back and make sure it works. As for the documentation, I updated it as well. For your text "It is strongly recommended that data objects of class matrix/array also be included.", I am currently not restricting the input to only data frames, so this should be fine. I updated the function, you may take a look. — Reply to this email directly, view it on GitHub<#16 (comment)https://github.com/oobianom/quickcode/issues/16#issuecomment-1951589461>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UI2HBCHACJMVXRDWALYUK2PJAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGU4DSNBWGE. You are receiving this because you commented.Message ID: @.***>

I appreciate the explanations. Thanks.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1953035041, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UL7HVAYYSB5N5RNXLTYUOPDFAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJTGAZTKMBUGE. You are receiving this because you commented.Message ID: @.***>

oobianom commented 9 months ago

I am not sure how to revise the function. If you have any ideas, let me know. In the meantime, I will see what I can do.

brichard1638 commented 9 months ago

I'm sure the function can be reworked without applying the matrix function but, just to maintain the integrity of the math here, I believe we should replicate the algorithm used in the original R package (svgViewR):

dataset = passed dataset of class data.frame or matrix/array n = number of observations v = number of variables

sqrt(rowSums((dataset - matrix(colMeans(dataset), n, v, byrow = TRUE))^2))


From: Obi Obianom @.> Sent: Monday, February 19, 2024 11:04 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

I am not sure how to revise the function. If you have any ideas, let me know. In the meantime, I will see what I can do.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1953453853, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UPI7IHM5QOC4N5MVMDYUQOD7AVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJTGQ2TGOBVGM. You are receiving this because you commented.Message ID: @.***>

oobianom commented 9 months ago

But we were having some additional errors with the "v" argument, which is why we took it out. Don't know if putting back necessary solves those errors you reported above.

I made some minor edit to the function nonetheless, you may test and see if it improves it. I noticed that n has to be a multiple of the data column length so I made that switch in the first few lines of the function.

brichard1638 commented 9 months ago

I think the objective here is to replicate the algorithm without using the matrix function. I have not examined the algorithm deep enough to be able to make a recommendation or to rewrite dummy code that would replicate the algorithm without the use of the matrix function.On Feb 19, 2024 11:42 PM, Obi Obianom @.***> wrote: But we were having some additional errors with the "v" argument, which is why we took it out. Don't know if putting back necessary solves those errors you reported above. I made some minor edit to the function nonetheless, you may test and see if it improves it. I noticed that n has to be a multiple of the data column length so I made that switch in the first few lines of the function.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

brichard1638 commented 9 months ago

If you can't get the function to work properly, perhaps this function should be deferred to another time, and not included in the next version of quickcode.


From: Obi Obianom @.> Sent: Monday, February 19, 2024 11:42 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

But we were having some additional errors with the "v" argument, which is why we took it out. Don't know if putting back necessary solves those errors you reported above.

I made some minor edit to the function nonetheless, you may test and see if it improves it. I noticed that n has to be a multiple of the data column length so I made that switch in the first few lines of the function.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1953479357, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UJDYZ27Z33ATJUPWHDYUQST5AVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJTGQ3TSMZVG4. You are receiving this because you commented.Message ID: @.***>

oobianom commented 9 months ago

I will give it a few days and see if I can find a solution. In the meantime, can you provide me with the dataset and code for all your tests that failed

brichard1638 commented 9 months ago

The attached represents a combination of the datasets I was using to test the pairDist function. I was testing the function with a combination of data frame and matrix/array objects.

I didn't save the errors as the most significant of these errors were already recorded under this section.


From: Obi Obianom @.> Sent: Wednesday, February 21, 2024 4:58 AM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

I will give it a few days and see if I can find a solution. In the meantime, can you provide me with the dataset and code for all your tests that failed

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1956284025, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UPANFII7ZWBDHAZJYTYUXALXAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJWGI4DIMBSGU. You are receiving this because you commented.Message ID: @.***>

oobianom commented 9 months ago

There is actually no attached there. Can you edit and attach. Or email me at idonshayo@gmail.com

oobianom commented 9 months ago

Hi Brice, I have now implemented the new revisions you sent for pairDist. I have them updated to this repository. I will run through the above examples and the one you sent in the email later today. Hopefully we can have the package ready to send out by tomorrow.

brichard1638 commented 9 months ago

This is great news!

Thank you the update.


From: Obi Obianom @.> Sent: Saturday, March 2, 2024 4:28 AM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

Hi Brice, I have now implemented the new revisions you sent for pairDist. I have them updated to this repository. I will run through the above examples and the one you sent in the email later today. Hopefully we can have the package ready to send out by tomorrow.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1974745160, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UMWV6C46AKYTRPZL33YWGLVXAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZUG42DKMJWGA. You are receiving this because you commented.Message ID: @.***>

oobianom commented 9 months ago

Hi Brice, I have now tested all the examples using the data as.data.frame(data) and as.matrix(data). The result was successful!

I will finalize the all revisions and documentations, and have the package sent out by tomorrow to publish in CRAN

brichard1638 commented 9 months ago

Obi:::

Great! I am anxious to start using the next version of the quickcode package!


From: Obi Obianom @.> Sent: Saturday, March 2, 2024 6:10 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] New feature: add various equations (Issue #16)

Hi Brice, I have now tested all the examples using the data as.data.frame(data) and as.matrix(data). The result was successful!

I will finalize the all revisions and documentations, and have the package sent out by tomorrow to publish in CRAN

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/16#issuecomment-1974936397, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UIMMSDMLFTXPDT7I5TYWJL7LAVCNFSM6AAAAABCEQVUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZUHEZTMMZZG4. You are receiving this because you commented.Message ID: @.***>