matsuken92 / molecular

0 stars 0 forks source link

特徴量管理 #5

Open matsuken92 opened 5 years ago

matsuken92 commented 5 years ago

Not good.

matsuken92 commented 5 years ago
scp -i ~/key/google_compute_engine -r 10.138.0.8:/home/kenichi.matsui/work/molecular/processed/v003/train_005.df.pkl ./
scp -i ~/key/google_compute_engine -r 10.138.0.8:/home/kenichi.matsui/work/molecular/processed/v003/test_005.df.pkl ./
scp -i ~/key/google_compute_engine -r 10.138.0.8:/home/kenichi.matsui/work/molecular/processed/v003/train_006.df.pkl ./
scp -i ~/key/google_compute_engine -r 10.138.0.8:/home/kenichi.matsui/work/molecular/processed/v003/test_006.df.pkl ./
matsuken92 commented 5 years ago

3次元回転

今の所うまくいかなかった・・・(augmentedとして試した)

source: http://bluewidz.blogspot.com/2017/09/blog-post_30.html

import numpy as np
import math

def matrics_rotate_(row, theta):
    # x0 = np.random.random()
    # y1 = 2*math.pi*np.random.random()
    # y2 = 2*math.pi*np.random.random()
    x0 = theta[0]
    y1 = theta[1]
    y2 = theta[2]
    r1 = math.sqrt(1.0-x0)
    r2 = math.sqrt(x0)
    u0 = math.cos(y2)*r2
    u1 = math.sin(y1)*r1
    u2 = math.cos(y1)*r1
    u3 = math.sin(y2)*r2
    coefi = 2.0*u0*u0-1.0
    coefuu = 2.0
    coefe = 2.0*u0
    r = np.zeros(shape=(3, 3))
    r[0, 0] = coefi+coefuu*u1*u1
    r[1, 1] = coefi+coefuu*u2*u2
    r[2, 2] = coefi+coefuu*u3*u3

    r[1, 2] = coefuu*u2*u3-coefe*u1
    r[2, 0] = coefuu*u3*u1-coefe*u2
    r[0, 1] = coefuu*u1*u2-coefe*u3

    r[2, 1] = coefuu*u3*u2+coefe*u1
    r[0, 2] = coefuu*u1*u3+coefe*u2
    r[1, 0] = coefuu*u2*u1+coefe*u3
    return np.dot([row.x, row.y, row.z], r)

rd.seed(SEED)
rotated_structures = []
for i, g in tqdm(structures.groupby("molecule_name")):
    theta = [np.random.random(), 2*np.random.random(), 2*np.random.random()]
    rotated = g.apply(lambda row: matrics_rotate_(row, theta), axis=1, result_type="expand")
    rotated.columns = ["x", "y", "z"]
    rotated["molecule_name"] = i
    rotated["atom_index"] = g.atom_index
    rotated["atom"] = g.atom
    rotated_structures += [rotated]
    # break

rotated_structures_df = pd.concat(rotated_structures, axis=0)
rotated_structures_df[structures.columns].to_csv(f"../input/rotated_structures_{SEED}.csv")
matsuken92 commented 5 years ago

特徴量作成手順

  1. https://www.kaggle.com/kenmatsu4/openbabel-feature
  2. https://www.kaggle.com/kenmatsu4/estimation-of-mulliken-charges-with-open-babel
  3. ../notebook/rdkit-feature-augmented.ipynb
  4. ../notebook/feature_eng_002.ipynb
  5. ../notebook/feature_eng_006.ipynb
  6. ../src/coulomb_interaction_feat.py, concat_coulomb_feat.py
  7. ../notebook/bond-calculation-feat.ipynb
matsuken92 commented 5 years ago

tda feature

https://www.kaggle.com/kenmatsu4/topological-data-analysis-ver-002

[見つかった円の数] 0 9152 1 40340 2 45624 3 23631 4 8443 5 2845 6 612 7 100 8 24 9 4